Multiplexed deterministic assembly of dna libraries

ABSTRACT

The present disclosure relates to methods of joining three or more double-stranded (ds) or single-stranded (ss) DNA molecules of interest in vitro or in vivo. The method allows the joining of a large number of DNA fragments, in a deterministic fashion. It can be used to rapidly generate nucleic acid libraries that can be subsequently used in a variety of applications that include, for example, genome editing and pathway assembly. Kits for performing the method are also disclosed.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims the benefit of priority to U.S. ProvisionalApplication Ser. No. 62/753,254, filed Oct. 31, 2018, which is hereinincorporated by reference in its entirety for all purposes.

FIELD

The present disclosure is directed to compositions and methods forjoining single-stranded and/or double-stranded nucleic acid moleculespermitting in vitro or in vivo assembly of multiple nucleic acidmolecules with overlapping terminal sequences in a single reaction. Thedisclosed methods and compositions can be useful for deterministicassembly of fragments of nucleic acid sequences and can be used forediting any DNA sequence such as, for example, plasmids, cosmid orspecific genes in the genome of desired host cells or organisms.

STATEMENT REGARDING SEQUENCE LISTING

The Sequence Listing associated with this application is provided intext format in lieu of a paper copy, and is hereby incorporated byreference into the specification. The name of the text file containingthe Sequence Listing is ZYMR_029_01US_SeqList_ST25.txt. The text file isabout 262 KB, and was created on Oct. 31, 2019, and is being submittedelectronically via EFS-Web.

BACKGROUND

Traditionally, nucleic acid assemblies such as plasmid or linear DNA aregenerated one at a time in a deterministic fashion and, thus, can beslow, expensive and labor-intensive. In contrast, current pooledapproaches for generating libraries of complex nucleic acid assembliescan enable the generation of many assemblies at once, but often resultin libraries representing all possible combinations between the sets ofparts in the assembly. Such approaches are a non-deterministic andcombinatorial approach to assembly and can also be time-consuming, laborintensive and expensive, especially in circumstances where a subset ofsequences are the desired product of the assembly reaction.

Thus, there is a need in the art for new methods for generating complexnucleic acid assemblies, which do not suffer from the aforementioneddrawbacks inherent with traditional methods for generating nucleic acidassemblies.

SUMMARY

In one aspect, provided herein is a composition comprising a mixture ofpolynucleotides, the mixture comprising: a first pool containing pairsof polynucleotides, wherein each pair in the first pool contains a firstpolynucleotide and a second polynucleotide; and a second pool of insertpolynucleotides, wherein each insert polynucleotide in the second poolcomprises a first assembly overlap sequence at its 5′ end that iscomplementary to a 3′ end of a first polynucleotide and a secondassembly overlap sequence at its opposing 3′end that is complementary toa 5′ end of a second polynucleotide in a pair of polynucleotides fromthe first pool. In some cases, the composition further comprises acloning vector, wherein, for each pair in the first pool, a 5′ end ofthe first polynucleotide and a 3′ end of the second polynucleotidecomprises sequence complementary to the cloning vector. In some cases,each polynucleotide from the first pool is selected such that nopolynucleotide from the first pool shares common sequence with any otherpolynucleotide from the first pool beyond a specified threshold,excluding designed assembly overlap sequences between the pairs ofpolynucleotides of the first pool and the insert polynucleotides of thesecond pool, or the pairs of polynucleotides of the first pool and thecloning vector. In some cases, the specified threshold is between 5 and15 contiguous nucleotides. In some cases, the composition furthercomprises a polymerase. In some cases, the polymerase isstrand-displacing or non-strand displacing. In some cases, thepolymerase is non-strand displacing and the composition furthercomprises a crowding agent. In some cases, the crowding agent ispolyethylene glycol (PEG). In some cases, the PEG is used at aconcentration of from about 3 to about 7% (weight/volume). In somecases, the PEG is selected from PEG-200, PEG-4000, PEG-6000, PEG-8000 orPEG-20,000. In some cases, the polymerase is strand displacing and thecomposition further comprises a single-stranded binding protein. In somecases, the single strand DNA binding protein is an extreme thermostablesingle-stranded DNA binding protein (ET SSB), E. coli recA, T7 gene 2.5product, phage lambda RedB or Rac prophage RecT. In some cases, thecomposition further comprises a 5′-3′ exonuclease. In some cases, thecomposition further comprises a ligase. In some cases, each pair in thefirst pool is double-stranded DNA (dsDNA) or single-stranded DNA(ssDNA). In some cases, each insert polynucleotide in the second pool isdsDNA or ssDNA. In some cases, for each pair in the first pool, thefirst polynucleotide and the second polynucleotide comprises sequencecorresponding to a target genomic locus in a host cell. In some cases,for each pair in the first pool, the first polynucleotide and the secondpolynucleotide comprise coding sequence corresponding to a gene that ispart of a metabolic pathway. In some cases, for each pair in the firstpool, the first polynucleotide and the second polynucleotide comprisecoding sequence corresponding to a functional domain or one or moreproteins. In some cases, for each pair in the first pool, the firstpolynucleotide and the second polynucleotide are linked together in asingle construct, wherein the single construct comprises one or morerecognition sequences for one or more site-specific nuclease(s) betweenthe first polynucleotide and the second polynucleotide. In some cases,the one or more recognition sequences for one or more site-specificnuclease(s) comprises a homing endonuclease recognition sequence. Insome cases, the first assembly overlap sequence and the second assemblyoverlap sequence on each insert polynucleotide in the second poolcomprises 1 or more nucleotides that are complementary to the 3′ end ofa first polynucleotide and the 5′ end of a second polynucleotide,respectively, in a pair of polynucleotides from the first pool. In somecases, the first assembly overlap sequence and the second assemblyoverlap sequence on each insert polynucleotide in the second poolcomprises about 25 nucleotides that are complementary to the 3′ end of afirst polynucleotide and the 5′ end of a second polynucleotide,respectively, in a pair of polynucleotides from the first pool. In somecases, each insert polynucleotide in the second pool comprises one ormore payload sequences located between the first assembly overlapsequence and the second assembly overlap sequence. In some cases, theone or more payload sequences are selected from promoters, genes,regulatory sequences, nucleic acid sequence encoding degrons, nucleicacid sequence encoding solubility tags, terminators, unique identifiersequence or portions thereof. In some cases, each pair of first andsecond polynucleotides in the first pool comprises sequencecorresponding to a different target genomic locus in a host cell ascompared to each other pair in the first pool. In some cases, each pairof first and second polynucleotides in the first pool comprises sequencecorresponding to the same target genomic locus in a host cell. In somecases, each payload sequence in the insert polynucleotides in the secondpool is different from the payload sequence in each other insertpolynucleotide in the second pool. In some cases, each payload sequencein the insert polynucleotides in the second pool is the same as thepayload sequence in each other insert polynucleotide in the second pool.In some cases, the site-specific nuclease(s) is one or more ofrestriction endonuclease(s), Type IIs endonuclease(s), homingendonuclease(s), RNA-guided nuclease(s), DNA-guided nuclease(s),zinc-finger nuclease(s), Transcription activator-like effectornuclease(s) (TALEN(s)) or nicking enzyme(s).

In another aspect, provided herein is a method for generating librariesof polynucleotides, the method comprising: a.) combining a first pool ofpolynucleotides and a second pool of polynucleotides, wherein the firstpool contains pairs of polynucleotides, wherein each pair in the firstpool contains a first polynucleotide and a second polynucleotide,wherein the second pool contains insert polynucleotides, wherein eachinsert polynucleotide in the second pool comprises a first assemblyoverlap sequence at its 5′ end that is complementary to a 3′ end of afirst polynucleotide and a second assembly overlap sequence at itsopposing 3′end that is complementary to a 5′ end of a secondpolynucleotide in a pair of polynucleotides from the first pool; b.)assembling the first pool and the second pool into a library ofpolynucleotides, wherein each polynucleotide in the library comprises aninsert polynucleotide from the second pool and a pair of firstpolynucleotides and second polynucleotides from the first pool, whereinthe assembling is performed via in vitro cloning methods or in vivocloning methods. In some cases, the first assembly overlap sequence andthe second assembly overlap sequence on each insert polynucleotide inthe second pool comprises 1 or more nucleotides that are complementaryto the 3′ end of a first polynucleotide and the 5′ end of a secondpolynucleotide, respectively, in a pair of polynucleotides from thefirst pool. In some cases, the first assembly overlap sequence and thesecond assembly overlap sequence on each insert polynucleotide in thesecond pool comprises about 25 nucleotides that are complementary to the3′ end of a first polynucleotide and the 5′ end of a secondpolynucleotide, respectively, in a pair of polynucleotides from thefirst pool. In some cases, for each pair in the first pool, the firstpolynucleotide and the second polynucleotide are linked together in asingle construct, wherein the single construct comprises one or morerecognition sequences for one or more site-specific nuclease(s) betweenthe first polynucleotide and the second polynucleotide. In some cases,the one or more recognition sequences for one or more site-specificnuclease(s) comprises a homing endonuclease recognition sequence. Insome cases, the linked single construct is produced by joiningindividual first and second polynucleotides via splicing andoverlap-extension PCR (SOE-PCR), restriction-ligation, blunt-endligation, overlap-based assembly method, recombination-based method, orany other enzymatic or chemical method of joining the first and secondpolynucleotides, or by synthesizing the single construct directly. Insome cases, the method further comprises combining a cloning vector withthe first pool and the second pool during step (a), wherein opposingends of the cloning vector comprise sequence complementary to a 5′end ofthe first polynucleotide and a 3′ end of the second polynucleotide foreach pair in the first pool. In some cases, the method further comprisescombining a cloning vector with the first pool prior to step (a),wherein opposing ends of the cloning vector comprise sequencecomplementary to a 5′end of the first polynucleotide and a 3′ end of thesecond polynucleotide for each pair in the first pool. In some cases,the cloning vector and the 5′ end of the first polynucleotide and the 3′end of the second polynucleotide in each pair from the first poolcomprise one or more recognition sequences for one or more site-specificnucleases. In some cases, the method further comprises generatingsingle-stranded complementary overhangs between the opposing ends of thecloning vector and the 5′end of the first polynucleotide and the 3′ endof the second polynucleotide in each pair from the first pool by addingthe one or more site-specific nucleases for the one or more recognitionsequences. In some cases, the method further comprises ligating thesingle-stranded complementary overhangs between the opposing ends of thecloning vector and the 5′end of the first polynucleotide and the 3′ endof the second polynucleotide in each pair from the first pool. Theligating can be performed using a DNA ligase. In some cases, step (b)results in a circular product comprising an insert polynucleotide fromthe second pool, a first and second polynucleotide from a pair from thefirst pool and the cloning vector. In some cases, the first pool isgenerated by selecting pairs of polynucleotide sequences from a largerset of such sequences such that no polynucleotide from the first poolshares common sequence with any other polynucleotide from the first poolbeyond a specified threshold, excluding designed assembly overlapsequences between the pairs of polynucleotides of the first pool and theinsert polynucleotides of the second pool, or the pairs ofpolynucleotides of the first pool and the cloning vector. In some cases,the specified threshold is between 5 and 15 contiguous nucleotides. Insome cases, the assembly is an in vitro cloning method, wherein themixture of the first pool and the second pool is heated to partially orfully denature polynucleotides present in the first and the secondpools, then cooled to room temperature before assembly. In some cases,prior to step (a), the first pool of polynucleotides is generated bycombining a mixture containing each first polynucleotide from the pairsof polynucleotides with a mixture containing each second polynucleotidefrom the pairs of polynucleotides. In some cases, each pair in the firstpool is double-stranded DNA (dsDNA) or single-stranded DNA (ssDNA). Insome cases, each insert polynucleotide in the second pool is dsDNA orssDNA. In some cases, for each pair in the first pool, the firstpolynucleotide and the second polynucleotide comprises sequencecorresponding to a target genomic locus in a host cell. In some cases,for each pair in the first pool, the first polynucleotide and the secondpolynucleotide comprise coding sequence corresponding to a gene that ispart of a metabolic pathway. In some cases, for each pair in the firstpool, the first polynucleotide and the second polynucleotide comprisecoding sequence corresponding to a functional domain or one or moreproteins. In some cases, each insert polynucleotide in the second poolcomprises one or more payload sequences located between the firstassembly overlap sequence and the second assembly overlap sequence. Insome cases, the one or more payload sequences are selected frompromoters, genes, regulatory sequences, nucleic acid sequence encodingdegrons, nucleic acid sequence encoding solubility tags, terminators,unique identifier sequence or portions thereof. In some cases, for eachpair in the first pool, the first polynucleotide and the secondpolynucleotide comprises sequence corresponding to a different targetgenomic locus in a host cell as compared to each other pair in the firstpool. In some cases, for each pair in the first pool, the firstpolynucleotide and the second polynucleotide comprises sequencecorresponding to the same target genomic locus in a host cell. In somecases, each payload sequence in the insert polynucleotides in the secondpool is different from the payload sequence in each other insertpolynucleotide in the second pool. In some cases, each payload sequencein the insert polynucleotides in the second pool is the same as thepayload sequence in each other insert polynucleotide in the second pool.In some cases, each insert polynucleotide in the second pool isgenerated by: (i) performing a polymerase chain reaction (PCR) on amixture comprising the payload sequence, a forward primer and a reverseprimer, wherein the forward primer comprises from 5′ to 3′, a shortstretch of one or more nucleotides complementary to the payloadsequence, the first assembly overlap sequence, one or more recognitionsequences for one or more site-specific nuclease(s), the second assemblyoverlap sequence and a second stretch of one or more nucleotidescomplementary to the payload sequence and wherein the reverse primercomprises sequence complementary to the payload sequence, wherein thePCR generates a PCR product comprising from 5′ to 3′, the short stretchof nucleic acid complementary to the payload sequence, the firstassembly overlap sequence, the one or more site-specific nucleaserecognition sequence(s), the second assembly overlap sequence and thepayload sequence; (ii) circularizing the PCR product via an assemblymethod selected from the group consisting of splicing andoverlap-extension PCR (SOE-PCR), restriction-ligation, blunt-endligation, overlap-based assembly method, and recombination-based method,or any other enzymatic or chemical method for joining two DNA molecules;and (iii) linearizing the circularized PCR product with one or moresite-specific nuclease(s) that recognizes the one or more site-specificnuclease recognition sequence(s), thereby generating the second pool ofpolynucleotides. In some cases, the site-specific nuclease(s) is one ormore of restriction endonuclease(s), Type IIs endonuclease(s), homingendonuclease(s), RNA-guided nuclease(s), DNA-guided nuclease(s),zinc-finger nuclease(s), TALEN(s) or nicking enzyme(s).

In yet another aspect, provided herein is a method for generatinglibraries of polynucleotides, the method comprising: (a) amplifying viapolymerase chain reaction (PCR) a first pool of polynucleotides, whereinthe first pool contains pairs of polynucleotides, wherein each pair inthe first pool contains a first polynucleotide and a secondpolynucleotide, and wherein each first polynucleotide and each secondpolynucleotide in a pair comprises a 5′ end and a 3′ end, wherein theamplifying introduces a common overlap sequence comprising one or morerecognition sequences for one or more site-specific nucleases onto the5′ end of a first polynucleotide and the 3′ end of a secondpolynucleotide in a pair from the first pool; (b) assembling each pairof first polynucleotides and second polynucleotides from the first poolinto a single nucleic acid fragment by utilizing common overlapsequence, wherein the single nucleic fragment for each pair comprises afirst polynucleotide and second polynucleotide separated by the commonoverlap sequence from the 5′ end of the first polynucleotide and the 3′end of the second polynucleotide, and wherein the 3′end of the firstpolynucleotide and the 5′ end of the second polynucleotide in the singlenucleic fragment for each pair are located on opposing terminal ends ofthe single nucleic acid fragment, distal to the one or moresite-specific nuclease recognition sequence(s); (c) combining the singlenucleic acid fragments for each pair with a second pool containinginsert polynucleotides, wherein each insert polynucleotide in the secondpool comprises a first assembly overlap sequence at its 5′ end that iscomplementary to the 3′ end of the first polynucleotide present withinthe single nucleic acid fragment and a second assembly overlap sequenceat its opposing 3′ end that is complementary to the 5′ end of the secondpolynucleotide present within the single nucleic acid fragment; (d)assembling the first pool and the second pool into a third pool ofcircularized products, wherein the assembling is performed via in vitroor in vivo overlap assembly methods, and wherein each circularizedproduct in the third pool comprises an insert sequence from the secondpool and a pair of first polynucleotides and second polynucleotides fromthe first pool; (e) linearizing each circularized product in the thirdpool via digestion by one or more site-specific nuclease(s) thatrecognizes the one or more site-specific nuclease recognitionsequence(s) located between the first polynucleotide sequence and thesecond polynucleotide sequence in each of the circularized products inthe third pool; and (f) assembling the linearized products into cloningvectors by in vitro or in vivo cloning methods. In some cases, the oneor more site-specific nuclease recognition sequence(s) located betweenthe first polynucleotide sequence and the second polynucleotide sequenceis a homing nuclease recognition sequence. In some cases, the one ormore site-specific nuclease(s) for the one or more site-specificnuclease recognition sequence(s) located between the firstpolynucleotide sequence and the second polynucleotide sequence is ahoming endonuclease. In some cases, the common overlap sequencecomprises an assembly overlap sequence of at least 1 nucleotide and theassembly in step (b) is performed by an overlap-based DNA assemblymethod. In some cases, the common overlap sequence comprises an assemblyoverlap sequence of from 10-25 nucleotides and the assembly in step (b)is performed by an overlap-based DNA assembly method. In some cases, theoverlap-based DNA assembly method is selected from SOE-PCR or an invitro overlap-assembly method (e.g., HiFi assembly). In some cases, theone or more site-specific nuclease recognition sequence(s) present inthe common overlap sequence on the 5′ end of the first polynucleotide iscomplementary to the one or more site-specific nuclease recognitionsequence(s) present in the common overlap sequence on the 3′ end of thesecond polynucleotide in each pair, and wherein the utilizing the commonoverlap sequences of the first and second polynucleotides in each pairin step (b) entails performing SOE-PCR. In some cases, the utilizing thecommon overlap sequences of the first and second polynucleotides in eachpair in step (b) entails digesting the one or more site-specificnuclease recognition sequences present in the common overlap sequence onthe 5′ end of the first polynucleotide and the 3′ end of the secondpolynucleotide in each pair with one or more site specific nucleases forthe one or more site-specific nuclease recognition sequences to generatesingle-stranded overhangs on the 5′ end of the first polynucleotide andthe 3′ end of the second polynucleotide in each pair that comprisecomplementary sequence; and ligating the complementary sequence presenton the single-stranded overhang on the 5′ end of the firstpolynucleotide and the 3′ end of the second polynucleotide in each pair.In some cases, the assembling of step (d) is performed using anoverlap-based DNA assembly method. In some cases, the overlap-based DNAassembly is selected from SOE-PCR and an in vitro overlap-assemblymethod (e.g., HiFi assembly). In some cases, the 3′ end of the firstpolynucleotide and the 5′ end of the second polynucleotide in the singlenucleic acid fragment in each pair comprise an additional set of one ormore site-specific nuclease recognition sequences and the first assemblyoverlap sequence and the second assembly overlap sequence in each insertpolynucleotide in the second pool comprise one or more site-specificnuclease recognition sequences. In some cases, the assembling in step(d) entails digesting the additional one or more site-specific nucleaserecognition sequences present on the 3′ end of the first polynucleotideand the 5′ end of the second polynucleotide in the single nucleic acidfragment in each pair and the one or more site-specific nucleaserecognition sequences present in the first and second assembly sequencesin each insert polynucleotide from the second pool with one or more sitespecific nucleases for the additional one or more site-specific nucleaserecognition sequences on the 3′ end of the first polynucleotide and the5′ end of the second polynucleotide in the single nucleic acid fragmentin each pair and the one or more site-specific nuclease recognitionsequences present in the first and second assembly sequences in eachinsert polynucleotide from the second pool to generate a single-strandedoverhang on the 3′ end of the first polynucleotide that comprisessequence complementary to sequence present on a single-stranded overhangon the 5′end of the first assembly sequence of an insert polynucleotidefrom the second pool and a single stranded overhang on the 5′ end of thesecond polynucleotide that comprises sequence complementary to asequence present on a single-stranded overhang on the 3′ end of thesecond assembly sequence of the same insert polynucleotide from thesecond pool; and ligating the complementary sequence present on thesingle-stranded overhangs. In some cases, the cloning vectors of step(f) comprise one or more site-specific nuclease recognition sequences.In some cases, the assembling in step (f) entails digesting the one ormore site-specific nuclease recognition sequences in the cloning vectorswith the one or more site-specific nucleases for the one or moresite-specific nuclease recognition sequences recognition sequencespresent in the cloning vectors, wherein the digesting generatessingle-stranded overhangs on opposing ends of the cloning vectors,wherein the single-stranded overhang on one of the opposing ends of thecloning vector comprises sequence complementary to an end of thelinearized product generated in step (e) and the single-strandedoverhang on the other of the opposing ends of the cloning vectorscomprises sequence complementary to an opposing end of the linearizedproduct generated in step (e); and ligating the complementary sequencespresent on the single-stranded overhangs of the cloning vectors and thelinearized products from step (e). In some cases, the first pool isgenerated by selecting pairs of polynucleotide sequences from a largerset of such sequences such that no polynucleotide from the first poolshares common sequence with any other polynucleotide from the first poolbeyond a specified threshold, excluding designed assembly overlapsequences between the pairs of polynucleotides of the first pool and theinsert polynucleotides of the second pool, or the pairs ofpolynucleotides of the first pool and the cloning vector. In some cases,the specified threshold is between 5 and 15 contiguous nucleotides. Insome cases, the first assembly overlap sequence and the second assemblyoverlap sequence on each insert polynucleotide in the second poolcomprises 1 or more nucleotides that are complementary to the opposingterminal ends of the single nucleic acid fragment. In some cases, thefirst assembly overlap sequence and the second assembly overlap sequenceon each insert polynucleotide in the second pool comprises about 25nucleotides that are complementary to the opposing terminal ends of thesingle nucleic acid fragment. In some cases, prior to step (a), thefirst pool of polynucleotides is generated by combining a mixturecontaining each first polynucleotide from the pairs of polynucleotideswith a mixture containing each second polynucleotide from the pairs ofpolynucleotides. In some cases, each pair in the first pool isdouble-stranded DNA (dsDNA) or single-stranded DNA (ssDNA). In somecases, each insert polynucleotide in the second pool is dsDNA or ssDNA.In some cases, for each pair in the first pool, the first polynucleotideand the second polynucleotide comprises sequence corresponding to atarget genomic locus in a host cell. In some cases, for each pair in thefirst pool, the first polynucleotide and the second polynucleotidecomprise coding sequence corresponding to a gene that is part of ametabolic pathway. In some cases, for each pair in the first pool, thefirst polynucleotide and the second polynucleotide comprise codingsequence corresponding to a functional domain or one or more proteins.In some cases, each insert polynucleotide in the second pool comprisesone or more payload sequences located between the first assembly overlapsequence and the second assembly overlap sequence. In some cases, theone or more payload sequences are selected from promoters, genes,regulatory sequences, nucleic acid sequence encoding degrons, nucleicacid sequence encoding solubility tags, terminators, unique identifiersequence or portions thereof. In some cases, for each pair in the firstpool, the first polynucleotide and the second polynucleotide comprisessequence corresponding to a different target genomic locus in a hostcell as compared to each other pair in the first pool. In some cases,for each pair in the first pool, the first polynucleotide and the secondpolynucleotide comprises sequence corresponding to the same targetgenomic locus in a host cell. In some cases, each payload sequence inthe insert polynucleotides in the second pool is different from thepayload sequence in each other insert polynucleotide in the second pool.In some cases, each payload sequence in the insert polynucleotides inthe second pool is the same as the payload sequence in each other insertpolynucleotide in the second pool. In some cases, each insertpolynucleotide in the second pool is generated by: (i) performing apolymerase chain reaction (PCR) on a mixture comprising the payloadsequence, a forward primer and a reverse primer, wherein the forwardprimer comprises from 5′ to 3′, a short stretch of one or morenucleotides complementary to the payload sequence, the first assemblyoverlap sequence, one or more recognition sequences for one or moresite-specific nuclease(s), the second assembly overlap sequence and asecond stretch of one or more nucleotides complementary to the payloadsequence and wherein the reverse primer comprises sequence complementaryto the payload sequence or to other sequence downstream of the payloadsequence, wherein the PCR generates a PCR product comprising from 5′ to3′, the short stretch of nucleic acid complementary to the payloadsequence, the first assembly overlap sequence, the one or moresite-specific nuclease recognition sequence(s), the second assemblyoverlap sequence and the payload sequence; (ii) circularizing the PCRproduct via an assembly method selected from the group consisting ofsplicing and overlap-extension PCR (SOE-PCR), restriction-ligation,blunt-end ligation, overlap-based assembly method, andrecombination-based method, or any other enzymatic or chemical methodfor joining two DNA molecules; and (iii) linearizing the circularizedPCR product with one or more site-specific nuclease(s) that recognizesthe one or more site-specific nuclease recognition sequence(s), therebygenerating the second pool of polynucleotides. In some cases, thesite-specific nuclease(s) is one or more of restriction endonuclease(s),Type IIs endonuclease(s), homing endonuclease(s), RNA-guidednuclease(s), DNA-guided nuclease(s), zinc-finger nuclease(s), TALEN(s)or nicking enzyme(s).

BRIEF DESCRIPTION OF THE FIGURES

FIG. 1 depicts a method for multiplexed, deterministic assembly of DNAlibraries showing an initial composition of insert polynucleotide(s) andfirst polynucleotides comprising a vector overlap assembly sequence anda second polynucleotide comprising a vector overlap assembly sequence,and optional cloning vector. The insert polynucleotide can comprise apayload sequence that has a length of zero nucleotides if a deletion, ornonzero if an insertion or replacement,

FIG. 2 illustrates an inside-out assembly method to pre-associate firstand second polynucleotides for use in the method of FIG. 1 to allow forassembling insert polynucleotides (e.g., promoters) longer than amaximum synthetic oligonucleotide length.

FIG. 3 illustrates adaptation of the method of FIG. 1 to allow forassembling insert polynucleotides (e.g., promoters) longer than themaximum synthetic oligonucleotide length.

FIG. 4 illustrates assembly of deterministic libraries using the methodof FIG. 1. The number of unique loci, payloads, and total possibleconstructs in each library is given in the table.

FIG. 5 illustrates the results of the successful in vitro assembly ofdeterministic libraries employing a pool comprising precisely designedDNA parts including a circular-permuted payload (insert). The long barsat the top indicate the structure of the plasmids to be assembled in thepool, and the shorter bars below represent Sanger sequences aligned tothe corresponding reference sequence for three separate samples from thepool of assemblies. Faint vertical lines at the inner ends of the readsrepresent expected sequencing artifacts at the tail ends of Sangerreads.

FIG. 6 illustrates total success rates for pooled assemblies for whichpayload containing parts were created via PCR using primers that appendthe assembly overlaps from templates derived from a host genome.

DETAILED DESCRIPTION Definitions

While the following terms are believed to be well understood by one ofordinary skill in the art, the following definitions are set forth tofacilitate explanation of the presently disclosed subject matter.

As used herein, the term “a” or “an” can refer to one or more of thatentity, i.e. can refer to a plural referents. As such, the terms “a” or“an”, “one or more” and “at least one” can be used interchangeablyherein. In addition, reference to “an element” by the indefinite article“a” or “an” does not exclude the possibility that more than one of theelements is present, unless the context clearly requires that there isone and only one of the elements.

Unless the context requires otherwise, throughout the presentspecification and claims, the word “comprise” and variations thereof,such as, “comprises” and “comprising” are to be construed in an open,inclusive sense that is as “including, but not limited to”.

Reference throughout this specification to “one embodiment” or “anembodiment” means that a particular feature, structure or characteristicdescribed in connection with the embodiment may be included in at leastone embodiment of the present disclosure. Thus, the appearances of thephrases “in one embodiment” or “in an embodiment” in various placesthroughout this specification may not necessarily all referring to thesame embodiment. It is appreciated that certain features of thedisclosure, which are, for clarity, described in the context of separateembodiments, may also be provided in combination in a single embodiment.Conversely, various features of the disclosure, which are, for brevity,described in the context of a single embodiment, may also be providedseparately or in any suitable sub-combination.

As used herein, the terms “cellular organism” “microorganism” or“microbe” should be taken broadly. These terms are used interchangeablyand include, but are not limited to, the two prokaryotic domains,Bacteria and Archaea, as well as certain eukaryotic fungi and protists.In some embodiments, the disclosure refers to the “microorganisms” or“cellular organisms” or “microbes” of lists/tables and figures presentin the disclosure. This characterization can refer to not only theidentified taxonomic genera of the tables and figures, but also theidentified taxonomic species, as well as the various novel and newlyidentified or designed strains of any organism in said tables orfigures. The same characterization holds true for the recitation ofthese terms in other parts of the Specification, such as in theExamples.

As used herein, the term “prokaryotes” is art recognized and refers tocells that contain no nucleus or other cell organelles. The prokaryotesare generally classified in one of two domains, the Bacteria and theArchaea. The definitive difference between organisms of the Archaea andBacteria domains is based on fundamental differences in the nucleotidebase sequence in the 16S ribosomal RNA.

As used herein, the term “Archaea” refers to a categorization oforganisms of the division Mendosicutes, typically found in unusualenvironments and distinguished from the rest of the prokaryotes byseveral criteria, including the number of ribosomal proteins and thelack of muramic acid in cell walls. On the basis of ssrRNA analysis, theArchaea consist of two phylogenetically-distinct groups: Crenarchaeotaand Euryarchaeota. On the basis of their physiology, the Archaea can beorganized into three types: methanogens (prokaryotes that producemethane); extreme halophiles (prokaryotes that live at very highconcentrations of salt (NaCl); and extreme (hyper) thermophilus(prokaryotes that live at very high temperatures). Besides the unifyingarchaeal features that distinguish them from Bacteria (i.e., no mureinin cell wall, ester-linked membrane lipids, etc.), these prokaryotesexhibit unique structural or biochemical attributes which adapt them totheir particular habitats. The Crenarchaeota consists mainly ofhyperthermophilic sulfur-dependent prokaryotes and the Euryarchaeotacontains the methanogens and extreme halophiles.

As used herein, “bacteria” or “eubacteria” can refer to a domain ofprokaryotic organisms. Bacteria include at least 11 distinct groups asfollows: (1) Gram-positive (gram+) bacteria, of which there are twomajor subdivisions: (1) high G+C group (Actinomycetes, Mycobacteria,Micrococcus, others) (2) low G+C group (Bacillus, Clostridia,Lactobacillus, Staphylococci, Streptococci, Mycoplasmas); (2)Proteobacteria, e.g., Purple photosynthetic+non-photosyntheticGram-negative bacteria (includes most “common” Gram-negative bacteria);(3) Cyanobacteria, e.g., oxygenic phototrophs; (4) Spirochetes andrelated species; (5) Planctomyces; (6) Bacteroides, Flavobacteria; (7)Chlamydia; (8) Green sulfur bacteria; (9) Green non-sulfur bacteria(also anaerobic phototrophs); (10) Radioresistant micrococci andrelatives; (11) Thermotoga and Thermosipho thermophiles.

As used herein, a “eukaryote” is any organism whose cells contain anucleus and other organelles enclosed within membranes. Eukaryotesbelong to the taxon Eukarya or Eukaryota. The defining feature that setseukaryotic cells apart from prokaryotic cells (the aforementionedBacteria and Archaea) is that they have membrane-bound organelles,especially the nucleus, which contains the genetic material, and isenclosed by the nuclear envelope.

As used herein, the terms “genetically modified host cell,” “recombinanthost cell,” and “recombinant strain” are used interchangeably herein andcan refer to host cells that have been genetically modified by thecloning and transformation methods of the present disclosure. Thus, theterms include a host cell (e.g., bacteria, yeast cell, fungal cell, CHO,human cell, etc.) that has been genetically altered, modified, orengineered, such that it exhibits an altered, modified, or differentgenotype and/or phenotype (e.g., when the genetic modification affectscoding nucleic acid sequences of the microorganism), as compared to thenaturally-occurring organism from which it was derived. It is understoodthat in some embodiments, the terms refer not only to the particularrecombinant host cell in question, but also to the progeny or potentialprogeny of such a host cell

As used herein, the term “wild-type microorganism” or “wild-type hostcell” can describe a cell that occurs in nature, i.e. a cell that hasnot been genetically modified.

As used herein, the term “genetically engineered” may refer to anymanipulation of a host cell's genome (e.g. by insertion, deletion,mutation, or replacement of nucleic acids).

As used herein, the term “control” or “control host cell” can refer toan appropriate comparator host cell for determining the effect of agenetic modification or experimental treatment. In some embodiments, thecontrol host cell is a wild type cell. In other embodiments, a controlhost cell is genetically identical to the genetically modified hostcell, save for the genetic modification(s) differentiating the treatmenthost cell. In some embodiments, the present disclosure teaches the useof parent strains as control host cells (e.g., the Si strain that wasused as the basis for the strain improvement program). In otherembodiments, a host cell may be a genetically identical cell that lacksa specific promoter or SNP being tested in the treatment host cell.

As used herein, the term “allele(s)” can mean any of one or morealternative forms of a gene, all of which alleles relate to at least onetrait or characteristic. In a diploid cell, the two alleles of a givengene occupy corresponding loci on a pair of homologous chromosomes.

As used herein, the term “locus” (loci plural) can mean any site atwhich an edit to the native genomic sequence is desired. In oneembodiment, said term can mean a specific place or places or a site on achromosome where for example a gene or genetic marker is found.

As used herein, the term “genetically linked” can refer to two or moretraits that are co-inherited at a high rate during breeding such thatthey are difficult to separate through crossing.

A “recombination” or “recombination event” as used herein can refer to achromosomal crossing over or independent assortment.

As used herein, the term “phenotype” can refer to the observablecharacteristics of an individual cell, cell culture, organism, or groupof organisms, which results from the interaction between thatindividual's genetic makeup (i.e., genotype) and the environment.

As used herein, the term “chimeric” or “recombinant” when describing anucleic acid sequence or a protein sequence can refer to a nucleic acid,or a protein sequence, that links at least two heterologouspolynucleotides, or two heterologous polypeptides, into a singlemacromolecule, or that rearranges one or more elements of at least onenatural nucleic acid or protein sequence. For example, the term“recombinant” can refer to an artificial combination of two otherwiseseparated segments of sequence, e.g., by chemical synthesis or by themanipulation of isolated segments of nucleic acids by geneticengineering techniques.

As used herein, a “synthetic nucleotide sequence” or “syntheticpolynucleotide sequence” is a nucleotide sequence that is not known tooccur in nature or that is not naturally occurring. Generally, such asynthetic nucleotide sequence can comprise at least one nucleotidedifference when compared to any other naturally occurring nucleotidesequence.

As used herein, the term “nucleic acid” can refer to a polymeric form ofnucleotides of any length, either ribonucleotides ordeoxyribonucleotides, or analogs thereof. This term can refer to theprimary structure of the molecule, and thus includes double- andsingle-stranded DNA, as well as double- and single-stranded RNA. It alsoincludes modified nucleic acids such as methylated and/or capped nucleicacids, nucleic acids containing modified bases, backbone modifications,and the like. The terms “nucleic acid” and “nucleotide sequence” areused interchangeably.

As used herein, the term “gene” can refer to any segment of DNAassociated with a biological function. Thus, genes can include, but arenot limited to, coding sequences and/or the regulatory sequencesrequired for their expression. Genes can also include non-expressed DNAsegments that, for example, form recognition sequences for otherproteins. Genes can be obtained from a variety of sources, includingcloning from a source of interest or synthesizing from known orpredicted sequence information, and may include sequences designed tohave desired parameters.

As used herein, the term “homologous” or “homologue” or “ortholog” or“orthologue” is known in the art and can refer to related sequences thatshare a common ancestor or family member and are determined based on thedegree of sequence identity.

The terms “homology,” “homologous,” “substantially similar” and“corresponding substantially” can be used interchangeably herein. Saidterms can refer to nucleic acid fragments wherein changes in one or morenucleotide bases do not affect the ability of the nucleic acid fragmentto mediate gene expression or produce a certain phenotype. These termscan also refer to modifications of the nucleic acid fragments of theinstant disclosure such as deletion or insertion of one or morenucleotides that do not substantially alter the functional properties ofthe resulting nucleic acid fragment relative to the initial, unmodifiedfragment. It is therefore understood, as those skilled in the art willappreciate, that the disclosure encompasses more than the specificexemplary sequences. These terms describe the relationship between agene found in one species, subspecies, variety, cultivar or strain andthe corresponding or equivalent gene in another species, subspecies,variety, cultivar or strain. For purposes of this disclosure homologoussequences are compared.

“Homologous sequences” or “homologues” or “orthologs” are thought,believed, or known to be functionally related. A functional relationshipmay be indicated in any one of a number of ways, including, but notlimited to: (a) degree of sequence identity and/or (b) the same orsimilar biological function. Preferably, both (a) and (b) are indicated.Sequence homology between amino acid or nucleic acid sequences can bedefined in terms of shared ancestry. Two segments of nucleic acid canhave shared ancestry because of either a speciation event (orthologs) ora duplication event (paralogs). Homology among amino acid or nucleicacid sequences can be inferred from their sequence similarity such thatamino acid or nucleic acid sequences are said to be homologous is saidamino acid or nucleic acid sequences share significant similarity.Significant similarity can be strong evidence that two sequences arerelated by divergent evolution from a common ancestor. Alignments ofmultiple sequences can be used to discover the homologous regions.Homology can be determined using software programs readily available inthe art, such as those discussed in Current Protocols in MolecularBiology (F. M. Ausubel et al., eds., 1987) Supplement 30, section 7.718,Table 7.71. Some alignment programs are BLAST (NCBI), MacVector (OxfordMolecular Ltd, Oxford, U.K.), ALIGN Plus (Scientific and EducationalSoftware, Pennsylvania) and AlignX (Vector NTI, Invitrogen, Carlsbad,Calif.). Another alignment program is Sequencher (Gene Codes, Ann Arbor,Mich.), using default parameters.

As used herein, the term “endogenous” or “endogenous gene,” can refer tothe naturally occurring gene, in the location in which it is naturallyfound within the host cell genome. In the context of the presentdisclosure, operably linking a heterologous promoter to an endogenousgene means genetically inserting a heterologous promoter sequence infront of an existing gene, in the location where that gene is naturallypresent. An endogenous gene as described herein can include alleles ofnaturally occurring genes that have been mutated according to any of themethods of the present disclosure.

As used herein, the term “exogenous” is used interchangeably with theterm “heterologous,” and refers to a substance coming from some sourceother than its native source. For example, the terms “exogenousprotein,” or “exogenous gene” refer to a protein or gene from anon-native source or location, and that have been artificially suppliedto a biological system.

As used herein, the term “nucleotide change” refers to, e.g., nucleotidesubstitution, deletion, and/or insertion, as is well understood in theart. For example, mutations can contain alterations that produce silentsubstitutions, additions, or deletions, but do not alter the propertiesor activities of the encoded protein or how the proteins are made.Alternatively, mutations can be nonsynonymous substitutions or changesthat can alter the amino acid sequence of the encoded protein and canresult in an alteration in properties or activities of the protein.

As used herein, the term “protein modification” can refer to, e.g.,amino acid substitution, amino acid modification, deletion, and/orinsertion, as is well understood in the art.

As used herein, the term “at least a portion” or “fragment” of a nucleicacid or polypeptide can mean a portion having the minimal sizecharacteristics of such sequences, or any larger fragment of thefull-length molecule, up to and including the full-length molecule. Afragment of a polynucleotide of the disclosure may encode a biologicallyactive portion of a genetic regulatory element. A biologically activeportion of a genetic regulatory element can be prepared by isolating aportion of one of the polynucleotides of the disclosure that comprisesthe genetic regulatory element and assessing activity as describedherein. Similarly, a portion of a polypeptide may be 4 amino acids, 5amino acids, 6 amino acids, 7 amino acids, and so on, going up to thefull length polypeptide. The length of the portion to be used willdepend on the particular application. A portion of a nucleic acid usefulas a hybridization probe may be as short as 12 nucleotides; in someembodiments, it is 20 nucleotides. A portion of a polypeptide useful asan epitope may be as short as 4 amino acids. A portion of a polypeptidethat performs the function of the full-length polypeptide wouldgenerally be longer than 4 amino acids.

Variant polynucleotides can also encompass sequences derived from amutagenic and recombinogenic procedure such as DNA shuffling. Strategiesfor such DNA shuffling are known in the art. See, for example, Stemmer(1994) PNAS 91:10747-10751; Stemmer (1994) Nature 370:389-391; Crameriet al. (1997) Nature Biotech. 15:436-438; Moore et al. (1997) J. Mol.Biol. 272:336-347; Zhang et al. (1997) PNAS 94:4504-4509; Crameri et al.(1998) Nature 391:288-291; and U.S. Pat. Nos. 5,605,793 and 5,837,458.

For PCR amplifications disclosed herein, oligonucleotide primers can bedesigned for use in PCR reactions to amplify corresponding DNA sequencesfrom cDNA or genomic DNA extracted from any organism of interest.Methods for designing PCR primers and PCR cloning are generally known inthe art and are disclosed in Sambrook et a/. (2001) Molecular Cloning: ALaboratory Manual (3^(rd) ed., Cold Spring Harbor Laboratory Press,Plainview, N.Y.). See also Innis et al., eds. (1990) PCR Protocols: AGuide to Methods and Applications (Academic Press, New York); Innis andGelfand, eds. (1995) PCR Strategies (Academic Press, New York); andInnis and Gelfand, eds. (1999) PCR Methods Manual (Academic Press, NewYork). Known methods of PCR include, but are not limited to, methodsusing paired primers, nested primers, single specific primers,degenerate primers, gene-specific primers, vector-specific primers,partially-mismatched primers, and the like.

The term “primer” as used herein can refer to an oligonucleotide whichis capable of annealing to the amplification target allowing a DNApolymerase to attach, thereby serving as a point of initiation of DNAsynthesis when placed under conditions in which synthesis of primerextension product is induced, i.e., in the presence of nucleotides andan agent for polymerization such as DNA polymerase and at a suitabletemperature and pH. The (amplification) primer can be single strandedfor maximum efficiency in amplification. The primer can be anoligodeoxyribonucleotide. The primer must be sufficiently long to primethe synthesis of extension products in the presence of the agent forpolymerization. The exact lengths of the primers will depend on manyfactors, including temperature and composition (A/T vs. G/C content) ofprimer. A pair of bi-directional primers consists of one forward and onereverse primer as commonly used in the art of DNA amplification such asin PCR amplification.

As used herein, “promoter” can refer to a DNA sequence capable ofcontrolling the expression of a coding sequence or functional RNA. Insome embodiments, the promoter sequence consists of proximal and moredistal upstream elements, the latter elements often referred to asenhancers. Accordingly, an “enhancer” can be a DNA sequence that canstimulate promoter activity, and may be an innate element of thepromoter or a heterologous element inserted to enhance the level ortissue specificity of a promoter. Promoters may be derived in theirentirety from a native gene, or be composed of different elementsderived from different promoters found in nature, or even comprisesynthetic DNA segments. It is understood by those skilled in the artthat different promoters may direct the expression of a gene indifferent tissues or cell types, or at different stages of development,or in response to different environmental conditions. It is furtherrecognized that since in most cases the exact boundaries of regulatorysequences have not been completely defined, DNA fragments of somevariation may have identical promoter activity.

As used herein, the phrases “recombinant construct”, “expressionconstruct”, “chimeric construct”, “construct”, and “recombinant DNAconstruct” are used interchangeably herein. A recombinant construct cancomprise an artificial combination of nucleic acid fragments, e.g.,regulatory and coding sequences that are not found together in nature.For example, a chimeric construct may comprise regulatory sequences andcoding sequences that are derived from different sources, or regulatorysequences and coding sequences derived from the same source, butarranged in a manner different than that found in nature. Such constructmay be used by itself or may be used in conjunction with a vector. If avector is used then the choice of vector is dependent upon the methodthat will be used to transform host cells as is well known to thoseskilled in the art. For example, a plasmid vector can be used. Theskilled artisan is well aware of the genetic elements that must bepresent on the vector in order to successfully transform, select andpropagate host cells comprising any of the isolated nucleic acidfragments of the disclosure. The skilled artisan will also recognizethat different independent transformation events will result indifferent levels and patterns of expression (Jones et al., (1985) EMBOJ. 4:2411-2418; De Almeida et al., (1989) Mol. Gen. Genetics 218:78-86),and thus that multiple events must be screened in order to obtain linesdisplaying the desired expression level and pattern. Such screening maybe accomplished by direct sequencing, Southern analysis of DNA, Northernanalysis of mRNA expression, immunoblotting analysis of proteinexpression, or phenotypic analysis, among others. Vectors can beplasmids, viruses, bacteriophages, pro-viruses, phagemids, transposons,artificial chromosomes, and the like, that replicate autonomously or canintegrate into a chromosome of a host cell. A vector can also be a nakedRNA polynucleotide, a naked DNA polynucleotide, a polynucleotidecomposed of both DNA and RNA within the same strand, apoly-lysine-conjugated DNA or RNA, a peptide-conjugated DNA or RNA, aliposome-conjugated DNA, or the like, that is not autonomouslyreplicating. As used herein, the term “expression” refers to theproduction of a functional end-product e.g., an mRNA or a protein(precursor or mature).

“Operably linked” or “functionally linked” can mean the sequentialarrangement of any functional payload according to the disclosure (e.g.,promoter, terminator, degron, solubility tag, etc.) with a furtheroligo- or polynucleotide. In some cases, the sequential arrangement canresult in transcription of said further polynucleotide. In some cases,the sequential arrangement can result in translation of said furtherpolynucleotide. The functional payloads can be present upstream ordownstream of the further oligo or polynucleotide. In one example,“operably linked” or “functionally linked” can mean a promoter controlsthe transcription of the gene adjacent or downstream or 3′ to saidpromoter. In another example, “operably linked” or “functionally linked”can mean a terminator controls termination of transcription of the geneadjacent or upstream or 5′ to said terminator.

The term “product of interest” or “biomolecule” as used herein can referto any product produced by microbes from feedstock. In some cases, theproduct of interest may be a small molecule, enzyme, peptide, aminoacid, organic acid, synthetic compound, fuel, alcohol, etc. For example,the product of interest or biomolecule may be any primary or secondaryextracellular metabolite. The primary metabolite may be, inter alia,ethanol, citric acid, lactic acid, glutamic acid, glutamate, lysine,threonine, tryptophan and other amino acids, vitamins, polysaccharides,etc. The secondary metabolite may be, inter alia, an antibiotic compoundlike penicillin, or an immunosuppressant like cyclosporin A, a planthormone like gibberellin, a statin drug like lovastatin, a fungicidelike griseofulvin, etc. The product of interest or biomolecule may alsobe any intracellular component produced by a microbe, such as: amicrobial enzyme, including: catalase, amylase, protease, pectinase,glucose isomerase, cellulase, hemicellulase, lipase, lactase,streptokinase, and many others. The intracellular component may alsoinclude recombinant proteins, such as insulin, hepatitis B vaccine,interferon, granulocyte colony-stimulating factor, streptokinase andothers.

As used herein, the term “HTP genetic design library” or “library”refers to collections of genetic perturbations according to the presentdisclosure. In some embodiments, the libraries of the present disclosuremay manifest as i) a collection of sequence information in a database orother computer file, ii) a collection of genetic constructs encoding forthe aforementioned series of genetic elements, or iii) host cell strainscomprising said genetic elements. In some embodiments, the libraries ofthe present disclosure may refer to collections of individual elements(e.g., collections of promoters for PRO swap libraries, collections ofterminators for STOP swap libraries, collections of protein solubilitytags for SOLUBILITY TAG swap libraries, or collections of proteindegradation tags for DEGRADATION TAG swap libraries). In otherembodiments, the libraries of the present disclosure may also refer tocombinations of genetic elements, such as combinations ofpromoter:genes, gene:terminator, or even promoter:gene:terminators. Insome embodiments, the libraries of the present disclosure may also referto combinations of promoters, terminators, protein solubility tagsand/or protein degradation tags. In some embodiments, the libraries ofthe present disclosure further comprise meta data associated with theeffects of applying each member of the library in host organisms. Forexample, a library as used herein can include a collection ofpromoter::gene sequence combinations, together with the resulting effectof those combinations on one or more phenotypes in a particular species,thus improving the future predictive value of using said combination infuture promoter swaps.

As used herein, the term “SNP” refers to Small Nuclear Polymorphism(s).In some embodiments, SNPs of the present disclosure should be construedbroadly, and include single nucleotide polymorphisms, sequenceinsertions, deletions, inversions, and other sequence replacements. Asused herein, the term “non-synonymous” or non-synonymous SNPs” refers tomutations that lead to coding changes in host cell proteins

A “high-throughput (HTP)” method of genomic engineering may involve theutilization of at least one piece of automated equipment (e.g. a liquidhandler or plate handler machine) to carry out at least one-step of saidmethod.

The term “polynucleotide” as used herein encompasses oligonucleotidesand refers to a nucleic acid of any length. Polynucleotides may be DNAor RNA. Polynucleotides may be single-stranded (ss) or double-stranded(ds) unless otherwise specified. Polynucleotides may be synthetic, forexample, synthesized in a DNA synthesizer, or naturally occurring, forexample, extracted from a natural source, or derived from cloned oramplified material. Polynucleotides referred to herein can containmodified bases or nucleotides.

The term “pool”, as used herein, can refer to a collection of at least 2polynucleotides. In some embodiments, a set of polynucleotides maycomprise at least 5, at least 10, at least 12 or at least 15 or morepolynucleotides.

The term “overlapping sequence”, or “overlapping assembly sequence” or“assembly overlap sequence” as used herein can refer to a sequence thatis complementary in two polynucleotides and where the overlappingsequence is ss, on one polynucleotide such that it can be hybridized toanother overlapping complementary ss region on another polynucleotide.An overlapping sequence may be at or close to (e.g., within about 5, 10,20 nucleotides of) the terminal ends of two distinct polynucleotides.For example, if the two distinct polynucleotides are single-stranded,then the assembly overlap sequence would be present on the 3′ terminalends of each of the single-stranded polynucleotides. Alternatively, ifthe two distinct polynucleotides are double-stranded, then the assemblyoverlap sequence of one of the polynucleotides can be present on the 3′terminal end of said polynucleotide (i.e., 3′ end in reference to thetop strand of the ds polynucleotide), while the complementary assemblyoverlap sequence on the other polynucleotide can be present at the 5′end of said polynucleotide (i.e., 5′ end in reference to the top strandof the ds polynucleotide) As necessary, the assembly overlap sequence onany ds polynucleotide may be made available by removing anynon-overlapping sequence. The removal can be enzymatic such as throughthe use of a 3′-5′ exonuclease activity of a polymerase.

As used herein, the term “assembling”, can refer to a reaction in whichtwo or more, four or more, six or more, eight or more, ten or more, 12or more, 15 or more polynucleotides, e.g., four or more polynucleotidesare joined to another to make a longer polynucleotide.

As used herein, the term “incubating under suitable reactionconditions”, can refer to maintaining a reaction a suitable temperatureand time to achieve the desired results, i.e., polynucleotide assembly.Reaction conditions suitable for the enzymes and reagents used in thepresent method are known (e.g. as described in the Examples herein) and,as such, suitable reaction conditions for the present method can bereadily determined. These reactions conditions may change depending onthe enzymes used (e.g., depending on their optimum temperatures, etc.).

As used herein, the term “joining”, can refer to the production ofcovalent linkage between two sequences.

As used herein, the term “composition” can refer to a combination ofreagents that may contain other reagents, e.g., glycerol, salt, dNTPs,etc., in addition to those listed. A composition may be in any form,e.g., aqueous or lyophilized, and may be at any state (e.g., frozen orin liquid form).

As used herein a “vector” is a suitable DNA into which a fragment or DNAassembly may be integrated such that the engineered vector can bereplicated in a host cell. A linearized vector may be createdrestriction endonuclease digestion of a circular vector or by PCR. Theconcentration of fragments and/or linearized vectors can be determinedby gel electrophoresis or other means.

Overview

Provided herein are methods and compositions that facilitate multipleassemblies to be produced in a single reaction in a deterministic ratherthan combinatorial manner. The methods and compositions provided hereinimpart the time, cost, and throughput benefits of multiplexed assemblywhile still enabling the creation of a library where all outputassemblies are determined in advance. The methods and compositionsprovided herein allow for the creation of many plasmids or constructs ina single assembly reaction, reducing the number of total reactionsrequired to create libraries of thousands of plasmids or constructs. Themethods and compositions provided herein also allows for assembling adefined subset of desired plasmids or constructs out of a larger set ofnumerous possible combinations. In some cases, the methods andcompositions provided herein minimizes the number of unique parts(‘homology arms’) that need to be amplified from genomes (orsynthesized) by not including any payload or insert sequence-specificassembly overlaps. This can eliminate the need to amplify multiplecopies of the same homology-arm pair designed to be combined withmultiple payloads or insert sequences. Further, diversity arising fromcombinations of payload/insert sequence and homology arm pairs isspecified by sequences on the payload/insert sequence itself. Themultitude of resulting payload sequences can be produced syntheticallyand inexpensively. Libraries generated using the methods andcompositions provided herein can be suitable for any number ofapplications such as, for example, any genome editing methods or anypooled pathway assembly. The genome editing methods known in the art canbe those that do not require tailored sites for RCME (recombinasecassette mediated exchange) for editing the genome of a cell at multiplearbitrary locations such as, for example, scarless genomic editing.

Provided herein is a composition comprising a mixture of polynucleotidesfor assembly in a deterministic fashion of a library of nucleic acidconstructs. The mixture can comprise n pools of polynucleotide parts(e.g., first and second polynucleotides). Then pools can be at most, atleast, or exactly 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,17, 18, 19 or 20 pools of polynucleotide. The n pools can each comprisean equal number of polynucleotide parts or they can comprise differingnumbers of polynucleotide parts (e.g., first and secondpolynucleotides). In one embodiment, the mixture comprises 2 pools suchthat one of the two pools comprises first polynucleotides and the otherof the two pools comprises second polynucleotides. Each pool of firstpolynucleotides can comprise a paired second polynucleotide in aseparate pool of second polynucleotides. Further to any of the aboveembodiments, the mixture can further comprise n−1 pools of insert orbridging polynucleotides. Each insert or bridging polynucleotide cancomprise sequence complementary to an element of one of the n pools ofpolynucleotide parts (e.g., first polynucleotide) at its 5′ end and toan element of one of the other pools of polynucleotide parts (e.g.,second polynucleotide) at its 3′ end. The insert sequences can bedesigned such that the assembly results in a library of polynucleotideswhere each polynucleotide comprises a specific element from each of then pools of polynucleotide parts, interspersed with a specific elementfrom each of the n−1 pools of insert polynucleotides.

The mixture of polynucleotides can comprise: a first pool containingpairs of polynucleotides, wherein each pair in the first pool contains afirst polynucleotide and a second polynucleotide; and a second pool ofinsert polynucleotides, wherein each insert polynucleotide in the secondpool comprises a first assembly overlap sequence at its 5′ end that iscomplementary to a 3′ end of a first polynucleotide and a secondassembly overlap sequence at its opposing 3′ end that is complementaryto a 5′ end of a second polynucleotide in a pair of polynucleotides fromthe first pool. In one embodiment, the composition can further comprisea cloning vector, wherein, for each pair in the first pool, a 5′ end ofthe first polynucleotide and a 3′ end of the second polynucleotidecomprises sequence complementary to the cloning vector. The cloningvector can be any cloning vector known in the art that is suitable forpropagation in a host cell such as, for example, E. coli or S.cerevisiae. In another embodiment, the composition also comprises apolymerase, an exonuclease, a ligase or any combination thereof. Thepolymerase can be strand displacing or non-strand displacing. Theexonuclease can be a 5′-3′ exonuclease. The pairs of polynucleotides inthe first pool can be double-stranded, single-stranded or a combinationthereof. The insert polynucleotides in the second pool can bedouble-stranded, single-stranded, or a combination thereof. In oneembodiment, the polymerase is non-strand displacing and the compositionfurther comprises a crowding agent. The crowding agent can be selectedfrom polyethylene glycol (PEG), ficoll or dextran. In one embodiment,the crowding agent is PEG. The PEG can be used at a concentration offrom about 3 to about 7% (weight/volume). The PEG can be selected fromPEG-200, PEG-4000, PEG-6000, PEG-8000 or PEG-20,000. In anotherembodiment, the polymerase is strand displacing and the compositionfurther comprises a single-stranded binding protein. The single strandDNA binding protein can be an extreme thermostable single-stranded DNAbinding protein (ET SSB), E. coli recA, T7 gene 2.5 product, phagelambda RedB or Rac prophage RecT.

In one embodiment, a composition provided herein is a mixture of thefollowing polynucleotides: (1) one or more first polynucleotides, (2)one or more insert polynucleotides, wherein the insert polynucleotidecomprises a first assembly overlap sequence at its 5′ end and a secondassembly overlap sequence at its opposing 3′end, and (3) one or moresecond polynucleotides. In another embodiment, the composition is amixture of the following polynucleotides: (1) one or more firstpolynucleotides, (2) one or more insert polynucleotides, wherein theinsert polynucleotide comprises a first assembly overlap sequence at its5′ end and a second assembly overlap sequence at its opposing 3′end, (3)one or more second polynucleotides and (4) a cloning vector. Each of theone or more first polynucleotides can comprise sequence at its 3′ ordistal end that is complementary to the first assembly overlap sequencepresent at the 5′ or proximal end of an insert polynucleotide from theone or more insert polynucleotides. Each of the one or more secondpolynucleotides can comprise sequence at its 5′ or proximal end that iscomplementary to the second assembly overlap sequence present at the 3′or distal end of an insert polynucleotide from the one or more insertpolynucleotides. Each of the one or more first polynucleotides can bepaired with at least one of the one or more second polynucleotides,thereby forming one or more pairs of first and second polynucleotides.Each pair of first and second polynucleotides can comprise sequence atthe distal end of the first polynucleotide that is complementary to thefirst assembly overlap sequence on the proximal end of an insertpolynucleotide from the one or more insert polynucleotides as well assequence at the proximal end of the second polynucleotide that iscomplementary to the distal end of an insert polynucleotide from the oneor more insert polynucleotides.

Provided herein is a method for generating libraries of polynucleotides,the method comprising: a.) combining n pools of polynucleotide parts(e.g., first and second polynucleotides) and n−1 pools of insert orbridging polynucleotides; and b.) assembling the n pools ofpolynucleotide parts and n−1 pools of insert polynucleotides into alibrary of polynucleotides, wherein each polynucleotide in the librarycomprises a defined combination of an individual element from each ofthen pools of polynucleotide parts and bridging polynucleotides. Eachinsert or bridging polynucleotide in the n−1 pools of insert or bridgingpolynucleotides comprises a first assembly overlap sequence at its 5′end that is complementary to a 3′ end of a first polynucleotide and asecond assembly overlap sequence at its opposing 3′end that iscomplementary to a 5′ end of a second polynucleotide in the n pools offirst and second polynucleotides. The assembling can be performed via invitro or in vivo overlap assembly methods. In some cases, the assemblingis performed via an in vitro cloning method, wherein the mixture of then pools of polynucleotide parts and n−1 pools of insert or bridgingpolynucleotides is heated to partially or fully denature anydouble-stranded polynucleotide parts present, then cooled at a slow rateto room temperature before being subjected to the in vitro cloningmethod.

Also provided herein is a method for generating libraries ofpolynucleotides, the method comprising: (a) combining a first pool ofpolynucleotides and a second pool of polynucleotides, wherein the firstpool contains pairs of polynucleotides, wherein each pair in the firstpool contains a first polynucleotide and a second polynucleotide,wherein the second pool contains insert polynucleotides, wherein eachinsert polynucleotide in the second pool comprises a first assemblyoverlap sequence at its 5′ end that is complementary to a 3′ end of afirst polynucleotide and a second assembly overlap sequence at itsopposing 3′ end that is complementary to a 5′ end of a secondpolynucleotide in a pair of polynucleotides from the first pool; (b)assembling the first pool and the second pool into a library ofpolynucleotides, wherein each polynucleotide in the library comprises aninsert polynucleotide from the second pool and a pair of firstpolynucleotides and second polynucleotides from the first pool. Theassembling can be performed via in vitro or in vivo overlap assemblymethods. In some cases, the assembling is performed via an in vitrocloning method, wherein the mixture of the first pool and the secondpool is heated to partially or fully denature polynucleotides present inthe first and the second pools, then cooled at a slow rate to roomtemperature before being subjected to the in vitro cloning method. Insome cases, the method further comprises combining a cloning vector withthe first pool and the second pool during step (a), wherein opposingends of the cloning vector comprise sequence complementary to a 5′end ofthe first polynucleotide and a 3′ end of the second polynucleotide foreach pair in the first pool. In some cases, the method further comprisescombining a cloning vector with the first pool prior to step (a),wherein opposing ends of the cloning vector comprise sequencecomplementary to a 5′ end of the first polynucleotide and a 3′ end ofthe second polynucleotide for each pair in the first pool. In somecases, the cloning vector and the 5′end of the first polynucleotide andthe 3′end of the second polynucleotide in each pair from the first poolcomprise one or more recognition sequences for one or more site-specificnucleases. In some cases, the method further comprises generatingsingle-stranded complementary overhangs between the opposing ends of thecloning vector and the 5′end of the first polynucleotide and the 3′endof the second polynucleotide in each pair from the first pool by addingthe one or more site-specific nucleases for the one or more recognitionsequences. In some cases, the method further comprises ligating thesingle-stranded complementary overhangs between the opposing ends of thecloning vector and the 5′ end of the first polynucleotide and the 3′ endof the second polynucleotide in each pair from the first pool. Theligating can be performed using a DNA ligase. In some cases, step (b)results in a circular product comprising an insert polynucleotide fromthe second pool, a first and second polynucleotide from a pair from thefirst pool and the cloning vector.

In one aspect, provided herein is a method for generating libraries ofpolynucleotides, the method comprising: (a) amplifying via polymerasechain reaction (PCR) a first pool of polynucleotides, wherein the firstpool contains pairs of polynucleotides, wherein each pair in the firstpool contains a first polynucleotide and a second polynucleotide, andwherein each first polynucleotide and each second polynucleotide in apair comprises a 5′ end and a 3′ end, wherein the amplifying introducesa common overlap sequence comprising one or more recognition sequencesfor one or more site-specific nucleases onto the 5′ end of a firstpolynucleotide and the 3′ end of a second polynucleotide in a pair fromthe first pool; (b) assembling each pair of first polynucleotides andsecond polynucleotides from the first pool into a single nucleic acidfragment by utilizing common overlap sequence, wherein the singlenucleic fragment for each pair comprises a first polynucleotide andsecond polynucleotide separated by the common overlap sequence from the5′ end of the first polynucleotide and the 3′ end of the secondpolynucleotide, and wherein the 3′end of the first polynucleotide andthe 5′ end of the second polynucleotide in the single nucleic fragmentfor each pair are located on opposing terminal ends of the singlenucleic acid fragment, distal to the one or more site-specific nucleaserecognition sequence(s); (c) combining the single nucleic acid fragmentsfor each pair with a second pool containing insert polynucleotides,wherein each insert polynucleotide in the second pool comprises a firstassembly overlap sequence at its 5′ end that is complementary to the 3′end of the first polynucleotide present within the single nucleic acidfragment and a second assembly overlap sequence at its opposing 3′ endthat is complementary to the 5′ end of the second polynucleotide presentwithin the single nucleic acid fragment; (d) assembling the first pooland the second pool into a third pool of circularized products, whereinthe assembling is performed via in vitro or in vivo overlap assemblymethods, and wherein each circularized product in the third poolcomprises an insert sequence from the second pool and a pair of firstpolynucleotides and second polynucleotides from the first pool; (e)linearizing each circularized product in the third pool via digestion byone or more site-specific nuclease(s) that recognizes the one or moresite-specific nuclease recognition sequence(s) located between the firstpolynucleotide sequence and the second polynucleotide sequence in eachof the circularized products in the third pool; and (f) assembling thelinearized products into cloning vectors by in vitro or in vivo cloningmethods. In some cases, the common overlap sequence comprises anassembly overlap sequence of at least 1 nucleotide and the assembly instep (b) is performed by an overlap-based DNA assembly method. In somecases, the common overlap sequence comprises an assembly overlapsequence of from 10-25 nucleotides and the assembly in step (b) isperformed by an overlap-based DNA assembly method. In some cases, theoverlap-based DNA assembly method is selected from SOE-PCR or an invitro overlap-assembly method (e.g., HiFi assembly using NEB® HiFibuilder). In some cases, the one or more site-specific nucleaserecognition sequence(s) present in the common overlap sequence on the 5′end of the first polynucleotide is complementary to the one or moresite-specific nuclease recognition sequence(s) present in the commonoverlap sequence on the 3′ end of the second polynucleotide in eachpair, and wherein the utilizing the common overlap sequences of thefirst and second polynucleotides in each pair in step (b) entailsperforming SOE-PCR. In some cases, the utilizing the common overlapsequences of the first and second polynucleotides in each pair in step(b) entails digesting the one or more site-specific nuclease recognitionsequences present in the common overlap sequence on the 5′ end of thefirst polynucleotide and the 3′ end of the second polynucleotide in eachpair with one or more site specific nucleases for the one or moresite-specific nuclease recognition sequences to generate single-strandedoverhangs on the 5′ end of the first polynucleotide and the 3′ end ofthe second polynucleotide in each pair that comprise complementarysequence; and ligating the complementary sequence present on thesingle-stranded overhang on the 5′ end of the first polynucleotide andthe 3′ end of the second polynucleotide in each pair. The assembling instep (d) can be performed via in vitro or in vivo overlap assemblymethods. The assembling of step (d) can be performed using anoverlap-based DNA assembly method. The overlap-based DNA assembly can beselected from SOE-PCR and an in vitro overlap-assembly method (e.g.,HiFi assembly using NEB® HiFi builder). In some cases, the 3′ end of thefirst polynucleotide and the 5′ end of the second polynucleotide in thesingle nucleic acid fragment in each pair comprise an additional set ofone or more site-specific nuclease recognition sequences and the firstassembly overlap sequence and the second assembly overlap sequence ineach insert polynucleotide in the second pool comprise one or moresite-specific nuclease recognition sequences. In some cases, theassembling in step (d) entails digesting the additional one or moresite-specific nuclease recognition sequences present on the 3′ end ofthe first polynucleotide and the 5′ end of the second polynucleotide inthe single nucleic acid fragment in each pair and the one or moresite-specific nuclease recognition sequences present in the first andsecond assembly sequences in each insert polynucleotide from the secondpool with one or more site specific nucleases for the additional one ormore site-specific nuclease recognition sequences on the 3′ end of thefirst polynucleotide and the 5′ end of the second polynucleotide in thesingle nucleic acid fragment in each pair and the one or moresite-specific nuclease recognition sequences present in the first andsecond assembly sequences in each insert polynucleotide from the secondpool to generate a single-stranded overhang on the 3′ end of the firstpolynucleotide that comprises sequence complementary to sequence presenton a single-stranded overhang on the 5′ end of the first assemblysequence of an insert polynucleotide from the second pool and a singlestranded overhang on the 5′ end of the second polynucleotide thatcomprises sequence complementary to a sequence present on asingle-stranded overhang on the 3′end of the second assembly sequence ofthe same insert polynucleotide from the second pool; and ligating thecomplementary sequence present on the single-stranded overhangs. In somecases, the assembling of step (d) is performed via an in vitro cloningmethod, wherein the mixture of the first pool and the second pool isheated to partially or fully denature polynucleotides present in thefirst and the second pools, then cooled at a slow rate to roomtemperature before being subjected to the in vitro cloning method. Theassembling in step (f) can be performed via in vitro cloning methods orin vivo cloning methods. In some cases, the cloning vectors of step (f)comprise one or more site-specific nuclease recognition sequences. Insome cases, the assembling in step (f) entails digesting the one or moresite-specific nuclease recognition sequences in the cloning vectors withthe one or more site-specific nucleases for the one or moresite-specific nuclease recognition sequences recognition sequencespresent in the cloning vectors, wherein the digesting generatessingle-stranded overhangs on opposing ends of the cloning vectors,wherein the single-stranded overhang on one of the opposing ends of thecloning vector comprises sequence complementary to an end of thelinearized product generated in step (e) and the single-strandedoverhang on the other of the opposing ends of the cloning vectorscomprises sequence complementary to an opposing end of the linearizedproduct generated in step (e); and ligating the complementary sequencespresent on the single-stranded overhangs of the cloning vectors and thelinearized products from step (e). A site-specific nuclease for use inany method or composition provided herein can be selected from arestriction endonuclease, Type IIs endonuclease(s), a homingendonuclease, an RNA-guided nuclease, a DNA-guided nuclease, azinc-finger nuclease, a TALEN and a nicking enzyme or any combinationthereof. The one or more site-specific nuclease recognition sequence(s)located between the first polynucleotide sequence and the secondpolynucleotide sequence can be one or more homing nuclease recognitionsequence(s). The one or more site-specific nuclease(s) for the one ormore site-specific nuclease recognition sequence(s) located between thefirst polynucleotide sequence and the second polynucleotide sequence canbe a homing endonuclease.

In another aspect, provided herein is a method for generating librariesof polynucleotides, the method comprising: (a) amplifying via polymerasechain reaction (PCR) a first pool of polynucleotides, wherein the firstpool contains pairs of polynucleotides, wherein each pair in the firstpool contains a first polynucleotide and a second polynucleotide, andwherein each first polynucleotide and each second polynucleotide in apair comprises a first terminal 5′ end and an opposing a second terminal3′ end, wherein the amplifying introduces one or more recognitionsequences for one or more site-specific nuclease(s) onto the firstterminal 5′ end of a first polynucleotide and the 3′ end of a secondpolynucleotide in a pair from the first pool, wherein the one or morerecognition sequences for the one or more site-specific nuclease(s) onthe first terminal 5′ end of the first polynucleotide is complementaryto the one or more recognition sequences for the one or moresite-specific nuclease(s) on the first terminal 3′ end of the secondpolynucleotide in the pair; (b) assembling each pair of firstpolynucleotides and second polynucleotides from the first pool into asingle nucleic acid fragment by performing a splicing and overlapextension polymerase chain reaction (SOE-PCR) utilizing the one or morecomplementary site-specific nuclease recognition sequence(s) on thefirst terminal 5′ ends of the first polynucleotides and the 3′ ends ofthe second polynucleotides within each pair, wherein the single nucleicfragment for each pair comprises a first polynucleotide and secondpolynucleotide separated by the one or more site-specific nucleaserecognition sequence(s) from the first terminal 5′ ends of the firstpolynucleotide and the 3′ end of the second polynucleotides, and whereinthe opposite second terminal 3′ ends of the first polynucleotide and the5′ end of the second polynucleotide in the single nucleic fragment foreach pair are located on opposing terminal ends of the single nucleicacid fragment, distal to the one or more site-specific nucleaserecognition sequence(s); (c) combining the single nucleic acid fragmentsfor each pair with a second pool containing insert polynucleotides,wherein each insert polynucleotide in the second pool comprises a firstassembly overlap sequence at its 5′ end that is complementary to theopposing terminal end one of the opposing 3′ terminal end of the firstpolynucleotides present within the single nucleic acid fragment and asecond assembly overlap sequence at its opposing 3′ end that iscomplementary to the other of the opposing terminal 5′ end of the secondpolynucleotides present within the single nucleic acid fragment; (d)assembling the first pool and the second pool into a third pool ofcircularized products, wherein the assembling is performed via in vitroor in vivo overlap assembly methods, and wherein each circularizedproduct in the third pool comprises an insert sequence from the secondpool and a pair of first polynucleotides and second polynucleotides fromthe first pool; (e) linearizing each circularized product in the thirdpool via addition of one or more site-specific nuclease(s) thatrecognizes the one or more site-specific nuclease recognitionsequence(s) located between the first polynucleotide sequence and secondpolynucleotide sequence in each of the circularized products in thethird pool; and (f) assembling the linearized products into cloningvectors by in vitro or in vivo cloning methods. The assembling in step(d) can be performed via in vitro or in vivo overlap assembly methods.In some cases, the assembling of step (d) is performed using anoverlap-based DNA assembly method. The overlap-based DNA assembly can beselected from SOE-PCR and an in vitro overlap-assembly method (e.g.,HiFi assembly using NEB® HiFi builder). In some cases, the 3′ end of thefirst polynucleotide and the 5′ end of the second polynucleotide in thesingle nucleic acid fragment in each pair comprise an additional set ofone or more site-specific nuclease recognition sequences and the firstassembly overlap sequence and the second assembly overlap sequence ineach insert polynucleotide in the second pool comprise one or moresite-specific nuclease recognition sequences. In some cases, theassembling in step (d) entails digesting the additional one or moresite-specific nuclease recognition sequences present on the 3′ end ofthe first polynucleotide and the 5′ end of the second polynucleotide inthe single nucleic acid fragment in each pair and the one or moresite-specific nuclease recognition sequences present in the first andsecond assembly sequences in each insert polynucleotide from the secondpool with one or more site specific nucleases for the additional one ormore site-specific nuclease recognition sequences on the 3′ end of thefirst polynucleotide and the 5′ end of the second polynucleotide in thesingle nucleic acid fragment in each pair and the one or moresite-specific nuclease recognition sequences present in the first andsecond assembly sequences in each insert polynucleotide from the secondpool to generate a single-stranded overhang on the 3′ end of the firstpolynucleotide that comprises sequence complementary to sequence presenton a single-stranded overhang on the 5′ end of the first assemblysequence of an insert polynucleotide from the second pool and a singlestranded overhang on the 5′ end of the second polynucleotide thatcomprises sequence complementary to a sequence present on asingle-stranded overhang on the 3′ end of the second assembly sequenceof the same insert polynucleotide from the second pool; and ligating thecomplementary sequence present on the single-stranded overhangs. In somecases, the assembling of step (d) is performed via an in vitro cloningmethod, wherein the mixture of the first pool and the second pool isheated to partially or fully denature polynucleotides present in thefirst and the second pools, then cooled at a slow rate to roomtemperature before being subjected to the in vitro cloning method. Theassembling in step (f) can be performed via in vitro cloning methods orin vivo cloning methods. In some cases, the cloning vectors of step (f)comprise one or more site-specific nuclease recognition sequences. Insome cases, the assembling in step (f) entails digesting the one or moresite-specific nuclease recognition sequences in the cloning vectors withthe one or more site-specific nucleases for the one or moresite-specific nuclease recognition sequences recognition sequencespresent in the cloning vectors, wherein the digesting generatessingle-stranded overhangs on opposing ends of the cloning vectors,wherein the single-stranded overhang on one of the opposing ends of thecloning vector comprises sequence complementary to an end of thelinearized product generated in step (e) and the single-strandedoverhang on the other of the opposing ends of the cloning vectorscomprises sequence complementary to an opposing end of the linearizedproduct generated in step (e); and ligating the complementary sequencespresent on the single-stranded overhangs of the cloning vectors and thelinearized products from step (e). A site-specific nuclease for use inany method or composition provided herein can be selected from arestriction endonuclease, Type IIs endonuclease(s), a homingendonuclease, an RNA-guided nuclease, a DNA-guided nuclease, azinc-finger nuclease, a TALEN and a nicking enzyme or any combinationthereof.

In one embodiment, the first polynucleotides and the secondpolynucleotides in the methods and compositions provided herein comprisesequence complementary or corresponding to a target genomic locus in ahost cell. The sequence complementary or corresponding to the targetgenomic locus present in the first and second polynucleotides can belocated on the terminus of said first and second polynucleotide thatopposes the terminus of said first and second polynucleotides thatcomprise sequence complementary to assembly overlap sequences present onan insert polynucleotide. When comprising sequence complementary orcorresponding to a target genomic locus in a host cell, the first andsecond polynucleotides can be referred to as homology arms. Inparticular, each first polynucleotide can be referred to as a lefthomology arm, while each second polynucleotide can be referred to as aright homology arm. When comprising sequence complementary orcorresponding to a target genomic locus in a host cell, generation oflibraries of nucleic acid constructs through assembly of pairs of firstand second polynucleotides and insert polynucleotides using thecompositions and methods provided herein can be subsequently used ingenome editing techniques for modifying the genome of a host cell. Thehost cell can be a prokaryotic cell or a eukaryotic host cell.

Polynucleotide Pairs

As described herein, the compositions and methods provided herein cancomprise or utilize first polynucleotides and second polynucleotidessuch that each first polynucleotide is paired with a secondpolynucleotide. The first and second polynucleotides can be chemicallysynthesized (e.g., array synthesized or column synthesized) using any ofthe methods known in the art for synthesizing nucleic acids. The firstand second polynucleotides can be amplified via an extension reaction(e.g., PCR) from existing DNA such as, for example, genomic DNA.

Each of the first and second polynucleotides can comprise functional andnonfunctional sequence or a combination thereof. The functional sequencecan refer to sequence that represents a gene or a portion or domainthereof or a regulatory element or a portion thereof. As describedfurther herein, the gene or the portion thereof can encode a proteinthat is part of a metabolic or biochemical pathway. Also as describedfurther herein, the regulatory element can be a promoter, terminator,solubility tag, degradation tag or degron. The non-functional sequencecan refer to sequence the does not represent a gene or portion thereofor a regulatory element or a portion thereof. The non-functionalsequence can be sequence that aids in or is utilized for the assembly ofsaid first and second polynucleotides with an insert polynucleotide asprovided herein. In one embodiment, each of the first and secondpolynucleotides comprises a mixture of functional and non-functionalsequence. In another embodiment, each of the first and secondpolynucleotides comprises of one or the other of functional ornon-functional sequence. In embodiments where the first and secondpolynucleotides comprise only functional sequence, the functionalsequence or a portion of the functional sequence can be utilized for theassembly of said first and second polynucleotides with an insertpolynucleotide as provided herein.

The first and/or second polynucleotides can each vary in length and, insome cases, can be at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 35,40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 200, 300, 400, 500,600, 700, 800, 900, 950 or 1000 nucleotide bases in length and/or may bemore than 1 kb or 2 kb in length. Alternatively, the first and/or secondpolynucleotides can be 2 kb or more, or 1 kb or more or more than 900bases, 800 bases, 700 bases, 600 bases, 500 bases, 400 bases, 300 bases,200 bases or 100 bases in length. The first and/or secondpolynucleotides length can be in the range of 100 nucleotides-2 kb forexample up to 100, up to 150, up to 200, up to 250, up to 300, up to350, up to 400, up to 450, up to 500, up to 550, up to 600, up to 650,up to 700, up to 750, or up to 800, up to 850, up to 900, up to 950, upto 1000, up to 1500, or up to 2000 nucleotides. The minimum length ofthe first and/or second polynucleotides may be defined by a preferableTm that is determined empirically.

As described herein, each of the first and second polynucleotidesequences can comprise sequence that aids in the assembly of said firstand second polynucleotides with an insert polynucleotide. In order toaid in said assembly, said sequence can be complementary to the assemblyoverlap sequences present on insert polynucleotides. The sequencecomplementary to the assembly overlap sequences present on insertpolynucleotides can also be referred to as assembly overlap sequences.In one embodiment, the assembly overlap sequences represent the entirefirst and/or second polynucleotide. In another embodiment, the assemblyoverlap sequences represent only a portion of the first and/or secondpolynucleotides and the first and/or second polynucleotides furthercomprise additional sequence beyond the assembly overlap sequences. Inone embodiment, a first polynucleotide in a pair of first and secondpolynucleotides as provided herein comprises an assembly overlapsequence at its distal or 3′end that is complementary to a firstassembly overlap sequence present at a 5′ or proximal end of an insertpolynucleotide, while a second polynucleotide in said pair comprises anoverlap assembly overlap sequence at its proximal or 5′end that iscomplementary to a second assembly overlap sequence present at a 3′ ordistal end of said insert polynucleotide. Further to this embodiment,the first and second polynucleotide can each comprise additionalsequence beyond the assembly overlap sequences. The additional sequenceof the first and/or second polynucleotides can tailor said first and/orsecond polynucleotides to a specific application. The specificapplication can be any applications that utilize nucleic acid librariesknown in the art, especially those that would benefit from a pooleddeterministic assembly. Exemplary uses can include, but not be limitedto, genome editing and pathway assembly.

The assembly overlap sequences present on a first and/or secondpolynucleotide can vary in length and, in some cases, can be at least 1,2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21,22, 23, 24, 25, 26, 27, 28, 29, 30 nucleotides in length and/or may beup 100 nucleotides in length (e.g., up to 50, up to 30, up to 25, up to20 or up to 15 nucleotides in length). The assembly overlap sequenceslength can be in the range of 15 nucleotides-100 nucleotides for exampleup to 20, up to 25, up to 30, up to 35, up to 40, up to 45, up to 50, upto 55, up to 60, up to 65, up to 70, up to 75, up to 80 nucleotides, upto 85 nucleotides, up to 90 nucleotides, up to 95 nucleotides or up to100 nucleotides. The assembly overlap sequences can be the same lengthas an assembly overlap sequence present on an insert polynucleotide. Theminimum length of the assembly overlap sequence may be defined by apreferable Tm that is determined empirically. In one embodiment, theassembly overlap sequence on a first and/or second polynucleotidecomprises 1 or more nucleotides that are complementary to an end of aninsert polynucleotide. In another embodiment, the assembly overlapsequence on a first and/or second polynucleotide comprises about 25nucleotides that are complementary to an end of an insertpolynucleotide.

As shown in FIG. 1, each of the pairs of first and secondpolynucleotides can further comprise vector overlap sequences with acloning vector such that the first polynucleotides (i.e., the first DNAfragment in FIG. 1) may comprise a vector overlap sequence to thecloning vector at its 5′ end, while the second polynucleotides (i.e.,the second DNA fragment in FIG. 1) may comprise vector overlap sequencesto the cloning vector at its 3′ end. In embodiments, where each of thefirst polynucleotide and the second polynucleotide in a pair furthercomprise the first and second DNA fragments as provided herein, saidfirst and second DNA fragments can be located downstream and adjacent tothe vector overlap sequence to the cloning vector in the firstpolynucleotide and upstream and adjacent to the vector overlap sequenceto the cloning vector in the second polynucleotide.

The vector overlap sequences can vary in length and, in some cases, canbe at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30 nucleotides in lengthand/or may be up 100 nucleotides in length (e.g., up to 50, up to 30, upto 25, up to 20 or up to 15 nucleotides in length). Alternatively, thevector overlap sequences can be 2 kb or less, or 1 kb or less or lessthan 900 bases, 800 bases, 700 bases, 600 bases, 500 bases, 400 bases,300 bases, 200 bases or 100 bases. The vector overlap sequences lengthcan be in the range of 15 nucleotides-80 nucleotides for example up to20, up to 25, up to 30, up to 35, up to 40, up to 45, up to 50, up to55, up to 60, up to 65, up to 70, up to 75, or up to 80 nucleotides. Theminimum length of the vector overlap sequence may be defined by apreferable Tm that is determined empirically.

In one embodiment, a pool containing pairs of first and secondpolynucleotides is generated by selecting pairs of first and secondpolynucleotide sequences from a larger set of such sequences such thatno polynucleotide from said pool shares common sequence with any otherpolynucleotide from said pool beyond a specified threshold, excludingdesigned overlap assembly sequences between the pairs of polynucleotidesof said pool and insert polynucleotides or pools thereof as providedherein, or said pool and a cloning vector. The specified threshold is atleast 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19or 20 contiguous nucleotides. The specified threshold is at most 1, 2,3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 or 20contiguous nucleotides. The specified threshold between 0 and 2, between1 and 3, between 2 and 4, between 3-5, between 4 and 6, between 5 and 7,between 6 and 8, between 7 and 9, between 8 and 10, between 9 and 11,between 10 and 12, between 11 and 13, between 12 and 14, between 13 and15, between 14 and 16, between 15 and 17, between 16 and 18, between 17and 19, between 18 and 20 or between 19 and 21 contiguous nucleotides.The specified threshold between 0 and 5, between 0 and 10, between 0 and15, between 0 and 20, between 5 and 10, between 5 and 15, between 5 and20, between 10 and 15 or between 10 and 20 contiguous nucleotides. Inone embodiment, the specified threshold is 12 contiguous nucleotides.Determination of a shared common sequence beyond the specified thresholdcan be done using a computer program that uses either BLAST analysis orsimple substring searching to determine whether components sharesequence with other components. If shared sequence is found beyond thespecified threshold the components would not be placed into a pooltogether.

In one embodiment, pairing of an insert polynucleotide as describedherein with a desired pair of first and second polynucleotides can befacilitated by preassembling the desired pair of first and secondpolynucleotides using an “inside-out assembly” method as shown in FIG.2. In this method, the first and second polynucleotides can be amplifiedby PCR such that the vector proximal ends of the first polynucleotideseach contain one or more unique site-specific nuclease site(s) orrecognition sequence(s). The site-specific nuclease recognitionsequences can be for site-specific nucleases selected from a restrictionendonuclease, Type IIs endonuclease(s), a homing endonuclease, anRNA-guided nuclease, a DNA-guided nuclease, a zinc-finger nuclease, aTALEN and a nicking enzyme and any combinations thereof. In oneembodiment, the vector proximal ends of the first polynucleotides eachcontain a single unique nuclease site or recognition sequence. In oneembodiment, the unique nuclease recognition sequence is a uniquerestriction endonuclease site such that said restriction endonucleasesite is not present in any of the polynucleotides present in acomposition provided herein. In one embodiment, the unique nuclease siteis a homing endonuclease sequence such as, for example, a homingendonuclease sequence specific for I-SceI or I-CeuI. A single pair offirst and second polynucleotides are combined and a splicing and overlapextension polymerase chain reaction (SOE-PCR) is performed to assemblethe two polynucleotides at the added unique nuclease sites (e.g. at thevector-proximal ends) leaving the ends that attach to an insertpolynucleotide free. Alternatively, the entire sequence comprising thelinked first and second polynucleotides can be synthesized directlyusing any of a variety of DNA synthesis methods known in the art. Theattached first and second polynucleotides are assembled with the insertpolynucleotide using an in vitro or in vivo overlap assembly methodknown in the art and/or provided herein, such as, for example, yeast(e.g., S cerevisiae) or E. coli homologous recombination based assembly,Gibson assembly or NEB® HiFi builder. The circularized product of thefirst and second polynucleotides with the insert polynucleotide can belinearized with the addition of the nuclease specific for the uniquenuclease sequence (e.g., the homing endonuclease for the specific homingendonuclease sequence) resulting in the insert polynucleotide beingflanked by the first and second polynucleotides which can then beassembled into the vector using Gibson assembly or other similar method.

Insert Polynucleotides/Payload Sequences

In one embodiment, an insert polynucleotide for use in a composition,kit or method provided herein comprises: (1) a first assembly overlapsequence on a 5′ or proximal end of said insert polynucleotide, and (2)a second assembly overlap sequence on an opposing 3′ or distal end ofsaid insert polynucleotide. Further to this embodiment, the firstassembly overlap sequence can comprise sequence complementary tosequence (e.g., an assembly overlap sequence) at a 3′ or distal end of afirst polynucleotide from a pair of first and second polynucleotides,while the second assembly overlap sequence can comprise sequencecomplementary to sequence (e.g., an assembly overlap sequence) at a 5′or proximal end of a second polynucleotide from the pair of first andsecond polynucleotides.

The first assembly overlap sequence and the second assembly overlapsequence on an insert polynucleotide provided herein can comprise 1, 2,3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22,23, 24, 25, 26, 27, 28, 29, 30, 35, 40, 45, 50 or more nucleotides inlength and/or may be up 100 nucleotides in length (e.g., up to 50, up to30, up to 25, up to 20 or up to 15 nucleotides in length) that arecomplementary to the 3′ end of a first polynucleotide and the 5′ end ofa second polynucleotide, respectively, in a pair of polynucleotides asprovided herein. The assembly overlap sequences length can be in therange of 15 nucleotides-100 nucleotides for example up to 20, up to 25,up to 30, up to 35, up to 40, up to 45, up to 50, up to 55, up to 60, upto 65, up to 70, up to 75, up to 80 nucleotides, up to 85 nucleotides,up to 90 nucleotides, up to 95 nucleotides or up to 100 nucleotides. Inone embodiment, the first assembly overlap sequence and the secondassembly overlap sequence on an insert polynucleotide provided hereincomprises 1 or more nucleotides that are complementary to the 3′ end ofa first polynucleotide and the 5′ end of a second polynucleotide,respectively, in a pair of polynucleotides provided herein. In anotherembodiment, the first assembly overlap sequence and the second assemblyoverlap sequence on an insert polynucleotide provided herein comprisesabout 25 nucleotides that are complementary to the 3′ end of a firstpolynucleotide and the 5′ end of a second polynucleotide, respectively,in a pair of polynucleotides provided herein.

In another embodiment, the insert polynucleotide further comprises oneor more payload sequences such that said one or more payload sequencesare located between the first and second assembly overlap sequences. Apayload sequence can be a random sequence. A payload sequence can be amarker sequence. The marker sequence can be any marker sequence known inthe art. A payload sequence can be a gene or a portion thereof. The geneor portion thereof can be part of a metabolic or biochemical pathway.The gene or portion thereof can encode a protein or a domain thereof. Apayload sequence can be selected from promoters, genes, regulatorysequences, nucleic acid sequence encoding degrons, nucleic acid sequenceencoding solubility tags, nucleic acid sequence encoding degradationtags, terminators, barcodes, regulatory sequences or portions thereof.In some cases, the three components of the insert polynucleotide (i.e.,the first assembly overlap sequence, the second assembly overlapsequence and the payload sequence) are synthesized or otherwise combinedinto contiguous pieces of DNA before use in an assembly method providedherein. In one embodiment, the first and second assembly overlaps arenot random but designed to match specific pairs of first and secondpolynucleotides.

In embodiments where the pair of first and second polynucleotidescomprise targeting sequences as described herein, a payload sequencepresent within an insert polynucleotide can result in an insertionrelative to the original locus targeted by the targeting sequences onthe pair of first and second polynucleotides, a deletion of sequencerelative to the original locus targeted by the targeting sequences onthe pair of first and second polynucleotides, or a replacement of onesequence with another. In the case of an insertion or modification, the‘payload’ can be the intended final sequence. In the case of a deletion,the ‘payload’ can be a marker sequence or no sequence.

In one embodiment, the insert polynucleotides are used in a pooledfashion. Further to this embodiment, each insert polynucleotide in apool of insert polynucleotides can comprise a first assembly overlapsequence that comprises sequence complementary to sequence (e.g., anassembly overlap sequence) at a 3′ or distal end of a firstpolynucleotide from a pair of first and second polynucleotides and asecond assembly overlap sequence that comprises sequence complementaryto sequence (e.g., an assembly overlap sequence) at a 5′ or proximal endof a second polynucleotide from the pair of first and secondpolynucleotides.

The pool of insert polynucleotides can contain any number of uniqueinsert polynucleotide sequences. The number of insert polynucleotidescan be at least, at most, or about 1, 5, 10, 25, 50, 75, 100, 125, 150,175, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800,850, 900, 950, 1000, 1100, 1200, 1300, 1400, 1500, 1600, 1700, 1800,1900, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10,000, 20,000,30,000, 40,000, 50,000, 75,000, 100,000, 150,000, 200,000 or 250,000unique insert polynucleotides with or without a payload sequence.

A payload sequence can be at most or at least 1, 2, 3, 4, 5, 6, 7, 8, 9,10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27,28, 29, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100,150, 200, 250, 300, 350, 400, 450, 500, 600, 700, 800, 900, 1000, 2000,3000, 4000, 5000, 6000, 7000, 8000, 9000 or 10,000 nucleotides inlength. In some cases, the payload sequence can be 0 nucleotides inlength. A payload sequence can be at a length such that whenincorporated into an insert polynucleotide, the entire insertpolynucleotide can be chemically synthesized. The synthesis can be anarray-based or column based synthesis method as known in the art. In oneembodiment, a payload sequence is of a length such that it can bedirectly included or synthesized in an insert oligonucleotide along withthe first and second assembly overlaps. An insert polynucleotide thatcan be synthesized can be up to about 1, 5, 10, 20, 30, 40, 50, 60, 70,80, 90, 100, 110, 120, 130, 140, 150, 160, 170, 180, 190, 200, 210, 220,250, 300, 350, 400 or more nucleotides in length.

In another embodiment, the insert polynucleotide can be generated in asingle pool using the methods described in FIG. 3. As shown in FIG. 3,the payload sequence (e.g., the promoter sequence in FIG. 3) can begenerated via PCR from three components: a pooled forward primer, acommon reverse primer, and a payload template sequence (e.g., thepromoter in FIG. 3). The payload sequence template can be a syntheticDNA fragment, a PCR product, or other single- or double-stranded DNAfragment. The pool of forward primers can be synthesized usingarray-based or column-based synthetic methods known in the art. Eachforward primer in the pool can comprise (from 5′ to 3′): 1) a sequencecomplementary to the distal or 3′ end of the payload template sequence,2) a second assembly overlap sequence comprising sequence complementaryto a second polynucleotide from a pair of first and secondpolynucleotides, 3) one or more recognition sequences for one or moresite-specific nuclease (e.g., a homing endonuclease site or recognitionsequence), 4) a first assembly overlap sequence comprising sequencecomplementary to a first polynucleotide from a pair of first and secondpolynucleotides, and 5) a priming sequence that binds to the proximalend or 5′ end of the payload template sequence. The common reverseprimer can bind to the distal end or 3′ end of the payload templatesequence or to other sequence downstream of the payload sequence. PCRcan be performed on the payload template sequence (e.g., the promoter inFIG. 3) using the pooled forward primers and the common reverse primer.After amplification, the PCR product can be circularized to generate acircular-permuted payload (insert) using an overlap assembly methodknown in the art, such as, for example, Gibson assembly, NEB® HIFIassembly, or similar methods, and then linearized using one or moresite-specific nuclease(s) that recognizes the one or more site-specificnuclease recognition sequences (e.g., the homing endonuclease, I-SceI inFIG. 3). Nuclease digestion can result in a fragment suitable for use asinsert polynucleotide (e.g., the “payload” part described in FIG. 1),with the large payloads flanked by first and second assembly overlapsequences (e.g., the homology arms or regions flanking the promotersequence in FIG. 3). As shown in FIG. 3, at the ends of the payloadsequences can be small partial nuclease recognition sequences (e.g.,I-SceI in FIG. 3) that can be excised by the overlap assembly methodutilized (e.g. 3′ and 5′ exonuclease activities of Gibson assemblyreagents, NEB® HIFI assembly reagents, or equivalent mixtures). Theproduct can be optionally amplified (e.g., RCA) after circularizationand before linearization.

In one embodiment, each insert polynucleotide comprises a payloadsequence such that each insert polynucleotide in a pool of insertpolynucleotides comprises a different payload sequence from the payloadsequence in each other insert polynucleotide in said pool.

In another embodiment, each insert polynucleotide comprises a payloadsequence such that each insert polynucleotide in a pool of insertpolynucleotides comprises the same payload sequence as the payloadsequence in each other insert polynucleotide in said pool.

Cloning Methods

As described herein, a composition comprising pairs of first and secondpolynucleotides as well as insert polynucleotides can be assembled intoa library of nucleic acids comprising first and second polynucleotideswith an insert polynucleotide therebetween. Assembly of the pairs offirst and second polynucleotides with the insert polynucleotides asprovided herein can be performed by either in vitro or in vivo cloningmethods. For the assembly of large DNA molecules, the final steps of theassembly may be conducted in vivo, such as in a yeast host cell. Thebalance between use of in vitro and in vivo assembly steps can bedetermined by the practicality of the method with regard to the natureof the nucleic acid molecules to be assembled.

In one embodiment, assembly of the pairs of first and secondpolynucleotides with the insert polynucleotides is performed using an invitro cloning method. The in vitro cloning method can be any in vitrocloning method that employs overlap assembly known in the art. The invitro cloning method used in the methods provided herein can be selectedfrom infusion cloning (Clontech®), Golden Gate Assembly, GatewayAssembly, Gibson Assembly, and NEB® HIFI assembly or any other suitablein vitro cloning method known in the art. Infusion cloning can entailmixing a first pool of pairs of first and second polynucleotides asprovided herein and a second pool of insert polynucleotides as describedherein with the infusion cloning reagent and then transforming theresultant assemblies into an E. coli cloning host cell. The in vitrocloning method can be any of the overlap assembly methods described inU.S. Pat. No. 8,968,999, which is herein incorporated by reference inits entirety. The in vitro cloning method can be any of the overlapassembly methods described in US20160060671, which is hereinincorporated by reference in its entirety. The in vitro cloning methodcan be the Gibson assembly method described in Jun Urano, Ph.D. andChristine Chen, Ph.D., Gibson Assembly® Primer-Bridge End Joining (PBnJ)Cloning, Synthetic Genomics Application Note, which is hereinincorporated by reference in its entirety. In one embodiment, acomposition comprising pairs of first and second polynucleotides, insertpolynucleotides and a cloning vector are joined using a 5′-3′exonuclease; and a strand-displacing polymerase also present in thecomposition. The composition can also comprise a buffer containing apotassium salt such as potassium chloride in a concentration range of 7mM-150 mM, for example, 20 mM-50 mM. A sodium salt (e.g., sodiumchloride) in the range of 10 mM-100 mM such as 20 mM may also be used inaddition to potassium salt. In some embodiments, the composition doesnot contain a crowding agent such as polyethylene glycol (PEG), Ficoll,or dextran. In some embodiments, the composition comprises a singlestranded (ss) binding protein. A ss DNA binding protein for use in thecomposition may be E. coli recA, T7 gene 2.5 product, RedB (from phagelambda) or RecT (from Rac prophage), ET SSB (extreme thermostablesingle-stranded DNA binding protein) or any other ss DNA bindingproteins known in the art could be used in the composition. Theinclusion of a ss binding protein can improve the efficiency of assemblyparticularly for nucleic acid fragments with longer overlap sequences(e.g. at least 20 nucleotides) than would be otherwise occur in theabsence of ss binding protein as measured by colony number. In someembodiments, the composition does not contain a non-strand displacingpolymerase.

In another embodiment, a composition comprising pairs of first andsecond polynucleotides, insert polynucleotides and a cloning vector arejoined using an isolated non-thermostable 5′ to 3′ exonuclease thatlacks 3′ exonuclease activity, a crowding agent, a non-strand-displacingDNA polymerase with 3′ exonuclease activity, or a mixture of said DNApolymerase with a second DNA polymerase that lacks 3′ exonucleaseactivity, and a ligase. The composition can further comprise a mixtureof dNTPs, and a suitable buffer, under conditions that are effective forjoining the polynucleotides and the cloning vector. In some embodiment,the composition can further comprise a crowding agent. The crowdingagent can be selected from polyethylene glycol (PEG), dextran or ficoll.In one embodiment, the crowding agent is PEG. The PEG can be used at aconcentration of from about 3 to about 7% (weight/volume). The PEG canbe selected from PEG-200, PEG-4000, PEG-6000, PEG-8000 or PEG-20,000. Insome embodiments, the exonuclease of is a T5 exonuclease and thecontacting is under isothermal conditions, and/or the crowding agent isPEG, and/or the non-strand-displacing DNA polymerase is PHUSION® DNApolymerase or VENTR® DNA polymerase, and/or a Taq ligase.

In one embodiment, assembly of the pairs of first and secondpolynucleotides with the insert polynucleotides is performed using an invivo cloning method. The in vivo cloning method can be any in vivocloning method known in the art. The in vivo cloning method can be ahomologous recombination mediated cloning method. The in vivo cloningmethod used in the methods provided herein can be selected from E. coli(RecA-dependent, RecA-independent or Red/ET-dependent) homologousrecombination, Overlap Extension PCR and Recombination (OEPR) cloning,yeast homologous recombination, and Transformation-associatedrecombination (TAR) cloning and gene assembly in Bacillus as describedin Tsuge, Kenji et al. “One step assembly of multiple DNA fragments witha designed order and orientation in Bacillus subtilis plasmid.” Nucleicacids research vol. 31, 21 (2003): e133, which is herein incorporated byreference.

Applications

The composition and assembly methods provided herein can be used toconstruct any desired assembly, such as plasmids, vectors, genes,metabolic pathways, minimal genomes, partial genomes, genomes,chromosomes, extrachromosomal nucleic acids, for example, cytoplasmicorganelles, such as mitochondria (animals), and in chloroplasts andplastids (plants), and the like.

The compositions and assembly methods provided herein can be used togenerate libraries of nucleic acid molecules, and methods to usemodified whole or partial nucleic acid molecules as generated therefrom.The libraries can contain 2 or more variants, and said multiplevariants, can be screened for members having desired characteristics,such as high production levels of desired products of interest, enhancedfunctionality of the product of interest, or decreased functionality (ifthat is advantageous). Such screening may be done by high throughputmethods, which may be robotic/automated as provided herein.

The disclosure also further includes products made by the compositionsand assembly methods provided herein, for example, the resultingassembled synthetic genes or genomes (synthetic or naturally occurring)and modified optimized genes and genomes, and the use(s) thereof.

The compositions and assembly methods provided herein can have a widevariety of applications, permitting, for example, the design of pathwaysfor the synthesis of desired products of interest or optimization of oneor more sequences whose gene products play a role in the synthesis orexpression of a desired product. The compositions and assembly methodsprovided herein can also be used to generate optimized sequences of agene or expression thereof or to combine one or more functional domainsor motifs of protein encoded by a gene. The gene can be part of abiochemical or metabolic pathway. The biochemical or metabolic pathwaycan produce a desired product of interest.

The desired product of interest can be any molecule that can beassembled in a cell culture, eukaryotic or prokaryotic expression systemor in a transgenic animal or plant. Thus, the nucleic acid molecules orlibraries thereof that result from the deterministic assembly methodsprovided herein may be employed in a wide variety of contexts to producedesired products of interest. In some cases, the product of interest maybe a small molecule, enzyme, peptide, amino acid, organic acid,synthetic compound, fuel, alcohol, etc. For example, the product ofinterest or biomolecule may be any primary or secondary extracellularmetabolite. The primary metabolite may be, inter alia, ethanol, citricacid, lactic acid, glutamic acid, glutamate, lysine, threonine,tryptophan and other amino acids, vitamins, polysaccharides, etc. Thesecondary metabolite may be, inter alia, an antibiotic compound likepenicillin, or an immunosuppressant like cyclosporin A, a plant hormonelike gibberellin, a statin drug like lovastatin, a fungicide likegriseofulvin, etc. The product of interest or biomolecule may also beany intracellular component produced by a host cell, such as: amicrobial enzyme, including: catalase, amylase, protease, pectinase,glucose isomerase, cellulase, hemicellulase, lipase, lactase,streptokinase, and many others. The intracellular component may alsoinclude recombinant proteins, such as: insulin, hepatitis B vaccine,interferon, granulocyte colony-stimulating factor, streptokinase andothers. The product of interest may also refer to a protein of interest.

Pathway Assembly

In one embodiment, the compositions and methods provided herein are usedto assemble a gene or a variant thereof. The gene or variant thereof canencode a protein that is part of a metabolic or biochemical pathway. Thevariant can be a codon optimized version or mutated version of saidgene. The metabolic or biochemical pathway can produce a product ofinterest as provided herein. In one embodiment, the gene sequence orvariant thereof can be present as a payload sequence within an insertpolynucleotide as provided herein. The pairs of first and secondpolynucleotides can comprise sequence such that when assembled with saidinsert polynucleotide can serve to facilitate targeting of and insertioninto a locus in a genetic element (e.g., genome, plasmid, etc.) within ahost cell using a gene editing method as provided herein. The locus canbe a specific locus or a random locus. Alternatively, the pairs of firstand second polynucleotides can comprise sequence that when assembledwith said insert polynucleotide can serve to facilitate further assemblyof the resultant assembly with other assemblies generated using themethods provided herein. The other assemblies can comprise one or moreadditional genes present within the same metabolic or biochemicalpathway and in this way facilitate the assembly of said metabolic orbiochemical pathway. All of the genes or variants thereof can beassembled using the technique described herein of overlapping sequenceson a single vector for a particular metabolic or biochemical pathway, orindependent vectors for each member of said pathway can be employed bymixing the vectors for each member in successive transformationmixtures. The assembly of the first and second polynucleotides with aninsert polynucleotide can be accomplished via assembly overlap sequencespresent in each of the polynucleotides using the assembly overlapmethods provided herein. The pairs of first and second polynucleotidescan further comprise vector overlap sequence as provided herein tofacilitate assembly into a suitable vector. The vector can be areplicating plasmid. In some cases, the first and/second polynucleotidecan further comprise sequence of a regulatory or control element thatcan govern an aspect of the gene or variant thereof or the proteinencoded thereby such as the transcription, translation, solubility, ordegradation thereof. The regulatory or control element can be apromoter, terminator, solubility tag, degradation tag or degron.

In another embodiment, the gene sequence or variant thereof is spreadacross a pair of first and second polynucleotides and an insertpolynucleotide located therebetween or spread across a first or secondpolynucleotide and an insert polynucleotide located therebetween. Bysuitable assembly overlap segments on each of the polynucleotides, amixture containing all of the polynucleotides can be assembled in thecorrect order in a single reaction mixture using overlap assembly asprovided herein. The resultant will be full-length coding sequences ofthe gene or variant thereof. The pairs of first and secondpolynucleotides can further comprise sequence such that when assembledwith said insert polynucleotide can serve to facilitate targeting of andinsertion into a locus in a genetic element (e.g., genome, plasmid,etc.) within a host cell using a gene editing method as provided herein.The locus can be a specific locus or a random locus. Alternatively, thepairs of first and second polynucleotides can further comprise sequencethat when assembled with said insert polynucleotide can serve tofacilitate further assembly of the resultant assembly with otherassemblies generated using the methods provided herein. The otherassemblies can comprise one or more additional genes present within thesame metabolic or biochemical pathway and in this way facilitate theassembly of said metabolic or biochemical pathway. All of the genes orvariants thereof can be assembled using the technique described hereinof overlapping sequences on a single vector for a particular metabolicor biochemical pathway, or independent vectors for each member of saidpathway can be employed by mixing the vectors for each member insuccessive transformation mixtures. The pairs of first and secondpolynucleotides can further comprise vector overlap sequence as providedherein to facilitate assembly into a suitable vector. The vector can bea replicating plasmid. In some cases, the first and/secondpolynucleotide can further comprise sequence of a regulatory or controlelement that can govern an aspect of the gene or variant thereof or theprotein encoded thereby such as the transcription, translation,solubility, or degradation thereof. The regulatory or control elementcan be a promoter, terminator, solubility tag, degradation tag ordegron.

In still another embodiment, the compositions and methods providedherein are used to assemble or combine nucleic acid sequence that encodemotifs or domains of a target protein. The nucleic acid sequenceencoding a particular motif or domain of a target protein can be spreadacross a pair of first and second polynucleotides and an insertpolynucleotide located therebetween or spread across a first or secondpolynucleotide and an insert polynucleotide located therebetween. Thenucleic acid sequence encoding a particular motif or domain of a targetprotein can be present on a first polynucleotide, while a second motifor domain of the target protein can be present on a secondpolynucleotide and an insert polynucleotide can be used to join saidfirst and second motif or domain of the target protein using assemblyoverlap sequences present on each polynucleotide and overlap assemblymethods as provided herein. In some cases, the insert polynucleotide cancomprise a portion of the first and/or second motif or domain. In somecases, the insert polynucleotide can comprise a third motif or domain ofthe target protein. The pairs of first and second polynucleotides canfurther comprise sequence such that when assembled with said insertpolynucleotide can serve to facilitate targeting of and insertion into alocus in a genetic element (e.g., genome, plasmid, etc.) within a hostcell using a gene editing method as provided herein. The locus can be aspecific locus or a random locus. The pairs of first and secondpolynucleotides can further comprise vector overlap sequence as providedherein to facilitate assembly into a suitable vector. The vector can bea replicating plasmid.

Gene Editing

As described herein, a composition comprising pairs of first and secondpolynucleotides as well as insert polynucleotides can be assembled intoa library of nucleic acids comprising first and second polynucleotideswith an insert polynucleotide therebetween that can be subsequentlyutilized to modify the genetic content of a host cell. As providedherein, the library of nucleic acids can comprise control elements(e.g., promoters, terminators, solubility tags, degradation tags ordegrons), modified forms of genes (e.g., genes with desired SNP(s)),antisense nucleic acids, and/or one or more genes that are part of ametabolic or biochemical pathway. In one embodiment, the modificationentails gene editing of the host cell. The gene editing can entailediting the genome of the host cell and/or a separate genetic elementpresent in the host cell such as, for example, a plasmid or cosmid. Thegene editing method that can utilize nucleic acid assemblies generatedusing the methods and compositions as provided herein can be any geneediting method or system known in the art and can be selected based onthe host for which gene editing is desired. Non-limiting examples ofgene editing include homologous recombination, CRISPR, TALENS, FOK, orother endonucleases.

Homologous Recombination

In one embodiment, the gene editing method is a homologous recombinationbased method known in the art. The homologous recombination based methodcan be selected from single-crossover homologous recombination,double-crossover homologous recombination, or lambda red recombineering.Further to this embodiment, the first polynucleotide and the secondpolynucleotide in a pair of first and second polynucleotides such thateach comprise sequence directed to or complementary to a desired loci ina nucleic acid element (e.g., genome, plasmid or cosmid) of a host celland thereby direct an insert polynucleotide located therebetween to adesired locus in the genetic element (e.g., genome, cosmid or plasmid)of the host cell. Accordingly, the sequence directed to or complementaryto a desired loci present in the pair can be used to determine thelocation(s) in the genome, cosmid or plasmid that will be targeted forediting. As exemplified in FIG. 1, the sequence directed to orcomplementary to a desired loci can be located at or toward the proximalor 5′ end of a first polynucleotide, while in the second polynucleotidethe sequence directed to or complementary to a desired loci can belocated at or near the distal or 3′ end. In the first polynucleotide,the sequence directed to or complementary to a desired loci can belocated upstream of an assembly overlap sequence present in the firstpolynucleotide and downstream of a vector overlap sequence, if present.In the second polynucleotide, the sequence directed to or complementaryto a desired loci can be located downstream of an assembly overlapsequence present in the second polynucleotide and upstream of a vectoroverlap sequence, if present.

In one embodiment, for each pair in a pool containing pairs of firstpolynucleotide and second polynucleotides, the sequence that iscomplementary to a desired loci in a pair is complementary to adifferent target locus in a host cell as compared to each other pair inthe said pool.

In another embodiment, for each pair in a pool containing pairs of firstpolynucleotide and second polynucleotides, the sequence that iscomplementary to a desired loci in a pair is complementary to the sametarget locus in a host cell as compared to each other pair in the saidpool.

Loop-in/Loop-Out

In some embodiments, the present disclosure teaches methods of loopingout selected regions of DNA from the host organisms. The looping outmethod can be as described in Nakashima et al. 2014 “Bacterial CellularEngineering by Genome Editing and Gene Silencing.” Int. J. Mol. Sci.15(2), 2773-2793. Looping out deletion techniques are known in the art,and are described in Tear et al. 2014 “Excision of Unstable ArtificialGene-Specific inverted Repeats Mediates Scar-Free Gene Deletions inEscherichia coli.” Appl. Biochem. Biotech. 175:1858-1867. The loopingout methods used in the methods provided herein can be performed usingsingle-crossover homologous recombination or double-crossover homologousrecombination. In one embodiment, looping out of selected regions canentail using single-crossover homologous recombination.

In one embodiment, a composition provided herein comprises pairs offirst and second polynucleotides (e.g., left/right homology arms),insert polynucleotides and a vector such that assembly of the pairs offirst and second polynucleotides with an insert polynucleotide and avector using an in in vitro or in vivo assembly method as providedherein generates loop out vectors. In one embodiment, single-crossoverhomologous recombination is used between a loop-out vector and the hostcell genome in order to loop-in said vector. The vector could comprise amarker that facilitates selection of looped-out clones after the loop-instep. In another embodiment, double-crossover homologous recombinationis used between a loop-out vector and the host cell genome in order tointegrate said vector. The insert sequence within the loop-out vectorcan be designed with a sequence, which is a direct repeat of an existingor introduced nearby host sequence, such that the direct repeats flankthe region of DNA slated for looping and deletion. The insert sequencecould further comprise a marker that facilitates selection of looped-outclones. Once inserted, cells containing the loop out plasmid or vectorcan be counter selected for deletion of the selection region.

In one aspect provided herein, polynucleotides or polynucleotidelibraries generated using the compositions and/or methods providedherein can be used in a gene editing method that can entail the use ofsets of proteins from one or more recombination systems. Saidrecombination systems can be endogenous to the microbial host cell orcan be introduced heterologously. The sets of proteins of the one ormore heterologous recombination systems can be introduced as nucleicacids (e.g., as plasmid, linear DNA or RNA, or integron) and beintegrated into the genome of the host cell or be stably expressed froman extrachromosomal element. The sets of proteins of the one or moreheterologous recombination systems can be introduced as RNA and betranslated by the host cell. The sets of proteins of the one or moreheterologous recombination systems can be introduced as proteins intothe host cell. The sets of proteins of the one or more recombinationsystems can be from a lambda red recombination system, a RecETrecombination system, a Red/ET recombination system, any homologs,orthologs or paralogs of proteins from a lambda red recombinationsystem, a RecET recombination system, or Red/ET recombination system orany combination thereof. The recombination methods and/or sets ofproteins from the RecET recombination system can be any of those asdescribed in Zhang Y., Buchholz F., Muyrers J. P. P. and Stewart A. F.“A new logic for DNA engineering using recombination in E. coli.” NatureGenetics 20 (1998) 123-128; Muyrers, J. P. P., Zhang, Y., Testa, G.,Stewart, A. F. “Rapid modification of bacterial artificial chromosomesby ET-recombination.” Nucleic Acids Res. 27 (1999) 1555-1557; Zhang Y.,Muyrers J. P. P., Testa G. and Stewart A. F. “DNA cloning by homologousrecombination in E. coli.” Nature Biotechnology 18 (2000) 1314-1317 andMuyrers J P et al., “Techniques: Recombinogenic engineering—new optionsfor cloning and manipulating DNA” Trends Biochem Sci. 2001 May;26(5):325-31, which are herein incorporated by reference. The sets ofproteins from the Red/ET recombination system can be any of those asdescribed in Rivero-Müller, Adolfo et al. “Assisted large fragmentinsertion by Red/ET-recombination (ALFIRE)—an alternative and enhancedmethod for large fragment recombineering” Nucleic acids research vol.35, 10 (2007): e78, which is herein incorporated by reference.

Lambda RED Mediated Gene Editing

As provided herein, gene editing as described herein can be performedusing Lambda Red-mediated homologous recombination as described byDatsenko and Wanner, PNAS USA 97:6640-6645 (2000), the contents of whichare hereby incorporated by reference in their entirety.

To use the lambda red recombineering system to modify target DNA, alinear donor DNA substrate (either dsDNA or ssDNA) can be electroporatedinto E. coli expressing the set of proteins from the lambda redrecombination system. The set of proteins from the lambda redrecombination system can comprise the exo, beta or gam proteins or anycombination thereof. Gam can prevent both the endogenous RecBCD andSbcCD nucleases from digesting the linear donor DNA (either dsDNA orssDNA) introduced into a microbial host cell, while exo is a 5′-3′dsDNA-dependent exonuclease that can degrade linear dsDNA starting fromthe 5′ end and generate 2 possible products (i.e., a partially dsDNAduplex with single-stranded 3′ overhangs or a ssDNA whose entirecomplementary strand was degraded) and beta can protect the ssDNAcreated by Exo and promote its annealing to a complementary ssDNA targetin the cell. Beta expression can be required for lambda red basedrecombination with an ssDNA oligo substrate as described atblog.addgene.org/lambda-red-a-homologous-recombination-based-technique-for-genetic-engineering,the contents of which are herein incorporated by reference.

The linear donor DNA substrate (either dsDNA or ssDNA) can be anassembly comprising a pair of first and second polynucleotides with aninsert polynucleotide located therebetween generated using the methodsand compositions provided herein. The pair of first and secondpolynucleotides can comprise genomic targeting sequences that targetsaid donor DNA substrate to a specific locus in the genome of the hostcell. These enzymes then catalyze the homologous recombination of thesubstrate with the target DNA sequence. This means cloning occurs invivo, as compared to restriction enzyme cloning where the geneticchanges occur in a test tube. The donor DNA substrate only requires ˜50nucleotides of homology to the target site for recombination. Asdescribed atblog.addgene.org/lambda-red-a-homologous-recombination-based-technique-for-genetic-engineering,whether a linear dsDNA or ssDNA substrate is used can depend on the goalof the experiment. dsDNA substrate may be best for insertions ordeletions greater than approximately 20 nucleotides, while ssDNAsubstrate may be best for point mutations or changes of only a few basepairs.

dsDNA substrate can be made using the compositions and methods providedherein such that the pairs of first and second polynucleotides compriseabout 50 base pairs of homology to the targeted insert site on opposingterminal ends. The dsDNA insert polynucleotide present within thesubstrate can include: large insertions or deletions, includingselectable DNA fragments, such as antibiotic resistance genes, as wellas non-selectable DNA fragments, such as gene replacements and tags.

ssDNA substrates can be also be made using the compositions and methodsprovided herein such that the pairs of first and second polynucleotidescomprise about 50 base pairs of homology to the targeted insert site onopposing terminal ends and can have the desired alteration(s) located inthe center of the sequence (i.e., within the insert polynucleotide).

ssDNA substrate can be more efficient than dsDNA with a recombinationfrequency between 0.1% to 1%, and can be increased to as high as 25-50%by designing substrates that avoid activating the methyl-directedmismatch repair (MMR) system. MMR's job is to correct DNA mismatchesthat occur during DNA replication. Activation of MMR can be avoidedby: 1) using a strain of bacteria that has key MMR proteins knocked outor 2) specially design ssDNA substrates to avoid MMR: 1) E. coli withinactivated MMR: Using E. coli with inactive MMR is definitely theeasier of the two options, but these cells are prone to mutations andcan have more unintended changes to their genomes. 2) Designing ssDNAsubstrates that avoid MMR activation: In one embodiment, a C/C mismatchat or within 6 base pairs of the edit site is introduced. In anotherembodiment, the desired change is flanked with 4-5 silent changes in thewobble codons, i.e. make changes to the third base pair of the adjacent4-5 codons that alter the nucleotide sequence but not the amino acidsequence of the translated protein. These changes can be 5′ or 3′ of thedesired change.

In one embodiment, the polynucleotides or polynucleotide librariesgenerated using the compositions and/or methods provided herein can beused in a gene editing method that is implemented in a microbial hostcell that already stably expresses lambda red recombination genes suchas the DY380 strain described atblog.addgene.org/lambda-red-a-homologous-recombination-based-technique-for-genetic-engineering,the contents of which are herein incorporated by reference. Otherbacterial strains that comprise components of the lambda redrecombination system and can be utilized to generate the organism to begenotyped using an enrichment method provided herein (e.g., CS-seq orSG-seq) can be found in Thomason et al (Recombineering: GeneticEngineering in Bacteria Using Homologous Recombination. CurrentProtocols in Molecular Biology. 106:V:1.16:1.16.1-1.16.39) and Sharan etal (Recombineering: A Homologous Recombination-Based Method of GeneticEngineering. Nature protocols. 2009; 4(2):206-223), the contents of eachof which are herein incorporated by reference.

As provided herein, the set of proteins of the lambda red recombinationsystem can be introduced into the microbial host cell prior toimplementation of any of the editing methods known in the art and/orprovided herein. Genes for each of the proteins of the lambda redrecombination system can be introduced on nucleic acids (e.g., asplasmids, linear DNA or RNA, a mini-k, a lambda red prophage orintegrons) and be integrated into the genome of the host cell orexpressed from an extrachromosomal element. In some cases, each of thecomponents (i.e., exo, beta, gam or combinations thereof) of the lambdared recombination system can be introduced as an RNA and be translatedby the host cell. In some cases, each of the components (i.e., exo,beta, gam or combinations thereof) of the lambda red recombinationsystem can be introduced as a protein into the host cell.

In one embodiment, genes for the set of proteins of the lambda redrecombination system are introduced on a plasmid. The set of proteins ofthe lambda red recombination system on the plasmid can be under thecontrol of a promoter such as, for example, the endogenous phage pLpromoter. In one embodiment, the set of proteins of the lambda redrecombination system on the plasmid is under the control of an induciblepromoter. The inducible promoter can be inducible by the addition ordepletion of a reagent or by a change in temperature. In one embodiment,the set of proteins of the lambda red recombination system on theplasmid is under the control of an inducible promoter such as theIPTG-inducible lac promoter or the arabinose-inducible pBAD promoter. Aplasmid expressing genes for the set of proteins of the lambda redrecombination system can also express repressors associated with aspecific promoter such as, for example, the lad, araC or cI857repressors associated with the IPTG-inducible lac promoter, thearabinose-inducible pBAD promoter and the endogenous phage pL promoters,respectively.

In one embodiment, genes for the set of proteins of the lambda redrecombination system are introduced on a mini-λ, which a defectivenon-replicating, circular piece of phage DNA, that when introduced intomicrobial host cell, integrates into the genome as described atblog.addgene.org/lambda-red-a-homologous-recombination-based-technique-for-genetic-engineering,the contents of which are herein incorporated by reference.

In one embodiment, genes for the set of proteins of the lambda redrecombination system are introduced on a lambda red prophage, which canallow for stable integration of the lambda red recombination system intoa microbial host cell such as described atblog.addgene.org/lambda-red-a-homologous-recombination-based-technique-for-genetic-engineering,the contents of which are herein incorporated by reference.

CRISPR Mediated Gene Editing

In one aspect provided herein, a genetic element (e.g., genome, cosmid,or plasmid) of a host cell can be modified by CRISPR.

The CRISPR/Cas system is a prokaryotic immune system that confersresistance to foreign genetic elements such as those present withinplasmids and phages and that provides a form of acquired immunity.CRISPR stands for Clustered Regularly Interspaced Short PalindromicRepeat, and cas stands for CRISPR-associated system, and refers to thesmall cas genes associated with the CRISPR complex.

CRISPR-Cas systems are most broadly characterized as either Class 1 orClass 2 systems. The main distinguishing feature between these twosystems is the nature of the Cas-effector module. Class 1 systemsrequire assembly of multiple Cas proteins in a complex (referred to as a“Cascade complex”) to mediate interference, while Class 2 systems use alarge single Cas enzyme to mediate interference. Each of the Class 1 andClass 2 systems are further divided into multiple CRISPR-Cas types basedon the presence of a specific Cas protein. For example, the Class 1system is divided into the following three types: Type I systems, whichcontain the Cas3 protein; Type III systems, which contain the Cas10protein; and the putative Type IV systems, which contain the Csf1protein, a Cas8-like protein. Class 2 systems are generally less commonthan Class 1 systems and are further divided into the following threetypes: Type II systems, which contain the Cas9 protein; Type V systems,which contain Cas12a protein (previously known as Cpf1, and referred toas Cpf1 herein), Cas12b (previously known as C2c1), Cas12c (previouslyknown as C2c3), Cas12d (previously known as CasY), and Cas12e(previously known as CasX); and Type VI systems, which contain Cas13a(previously known as C2c2), Cas13b, and Cas13c. Pyzocha et al., ACSChemical Biology, Vol. 13 (2), pgs. 347-356. In one embodiment, theCRISPR-Cas system for use in the methods provided herein is a Class 2system. In one embodiment, the CRISPR-Cas system for use in the methodsprovided herein is a Type II, Type V or Type VI Class 2 system. In oneembodiment, the CRISPR-Cas system for use in the methods provided hereinis selected from Cas9, Cas12a, Cas12b, Cas12c, Cas12d, Cas12e, Cas13a,Cas13b, Cas13c or homologs, orthologs or paralogs thereof.

CRISPR systems used in methods disclosed herein comprise a Cas effectormodule comprising one or more nucleic acid guided CRISPR-associated(Cas) nucleases, referred to herein as Cas effector proteins. In someembodiments, the Cas proteins can comprise one or multiple nucleasedomains. A Cas effector protein can target single stranded or doublestranded nucleic acid molecules (e.g. DNA or RNA nucleic acids) and cangenerate double strand or single strand breaks. In some embodiments, theCas effector proteins are wild-type or naturally occurring Cas proteins.In some embodiments, the Cas effector proteins are mutant Cas proteins,wherein one or more mutations, insertions, or deletions are made in a WTor naturally occurring Cas protein (e.g., a parental Cas protein) toproduce a Cas protein with one or more altered characteristics comparedto the parental Cas protein.

In some instances, the Cas protein is a wild-type (WT) nuclease.Non-limiting examples of suitable Cas proteins for use in the presentdisclosure include C2c1, C2c2, C2c3, Cas1, Cas1B, Cas2, Cas3, Cas4,Cas5, Cash, Cas7, Cas8, Cas9 (also known as Csn1 and Csx12), Cas10,Cpf1, Csy1, Csy2, Csy3, Cse1, Cse2, Csc1, Csc2, Csa5, Csn2, Csm1, Csm2,Csm3, Csm4, Csm5, Csm6, Cmr1, Cmr3, Cmr4, Cmr5, Cmr6, Csb1, Csb2, Csb3,Csx17, Csx14, Csx100, Csx16, CsaX, Csx3, Csx1, Csx15, Csf1, Csf2, Csf3,Csf4, MAD1-20, SmCsm1, homologues thereof, orthologues thereof, variantsthereof, mutants thereof, or modified versions thereof. Suitable nucleicacid guided nucleases (e.g., Cas 9) can be from an organism from agenus, which includes but is not limited to: Thiomicrospira,Succinivibrio, Candidatus, Porphyromonas, Acidomonococcus, Prevotella,Smithella, Moraxella, Synergistes, Francisella, Leptospira,Catenibacterium, Kandleria, Clostridium, Dorea, Coprococcus,Enterococcus, Fructobacillus, Weissella, Pediococcus, Corynebacter,Sutterella, Legionella, Treponema, Roseburia, Filifactor, Eubacterium,Streptococcus, Lactobacillus, Mycoplasma, Bacteroides, Flaviivola,Flavobacterium, Sphaerochaeta, Azospirillum, Gluconacetobacter,Neisseria, Roseburia, Parvibaculum, Staphylococcus, Nitratifractor,Alicyclobacillus, Brevibacilus, Bacillus, Bacteroidetes, Carnobacterium,Clostridiaridium, Clostridium, Desulfonatronum, Desulfovibrio,Helcococcus, Leptotrichia, Listeria, Methanomethyophilus,Methylobacterium, Opitutaceae, Paludibacter, Rhodobacter, Sphaerochaeta,Tuberibacillus, and Campylobacter. Species of organism of such a genuscan be as otherwise herein discussed.

Suitable nucleic acid guided nucleases (e.g., Cas9) can be from anorganism from a phylum, which includes but is not limited to: Firmicute,Actinobacteria, Bacteroidetes, Proteobacteria, Spirochates, andTenericutes. Suitable nucleic acid guided nucleases can be from anorganism from a class, which includes but is not limited to:Erysipelotrichia, Clostridia, Bacilli, Actinobacteria, Bacteroidetes,Flavobacteria, Alphaproteobacteria, Betaproteobacteria,Gammaproteobacteria, Deltaproteobacteria, Epsilonproteobacteria,Spirochaetes, and Mollicutes. Suitable nucleic acid guided nucleases canbe from an organism from an order, which includes but is not limited to:Clostridiales, Lactobacillales, Actinomycetales, Bacteroidales,Flavobacteriales, Rhizobiales, Rhodospirillales, Burkholderiales,Neisseriales, Legionellales, Nautiliales, Campylobacterales,Spirochaetales, Mycoplasmatales, and Thiotrichales. Suitable nucleicacid guided nucleases can be from an organism from within a family,which includes but is not limited to: Lachnospiraceae, Enterococcaceae,Leuconostocaceae, Lactobacillaceae, Streptococcaceae,Peptostreptococcaceae, Staphylococcaceae, Eubacteriaceae,Corynebacterineae, Bacteroidaceae, Flavobacterium, Cryomoorphaceae,Rhodobiaceae, Rhodospirillaceae, Acetobacteraceae, Sutterellaceae,Neisseriaceae, Legionellaceae, Nautiliaceae, Campylobacteraceae,Spirochaetaceae, Mycoplasmataceae, and Francisellaceae.

Other nucleic acid guided nucleases (e.g., Cas9) suitable for use in themethods, systems, and compositions of the present disclosure includethose derived from an organism such as, but not limited to:Thiomicrospira sp. XS5, Eubacterium rectale, Succinivibriodextrinosolvens, Candidatus Methanoplasma termitum, CandidatusMethanomethylophilus alvus, Porphyromonas crevioricanis, Flavobacteriumbranchiophilum, Acidomonococcus sp., Lachnospiraceae bacterium COE1,Prevotella brevis ATCC 19188, Smithella sp. SCADC, Moraxella bovoculi,Synergistes jonesii, Bacteroidetes oral taxon 274, Francisellatularensis, Leptospira inadai serovar Lyme str. 10, Acidomonococcus sp.crystal structure (5B43) S. mutans, S. agalactiae, S. equisimilis, S.sanguinis, S. pneumonia; C. jejuni, C. coli; N. salsuginis, N.tergarcus; S. auricularis, S. carnosus; N. meningitides, N. gonorrhoeae;L. monocytogenes, L. ivanovii; C. botulinum, C. difficile, C. tetani, C.sordellii; Francisella tularensis 1, Prevotella albensis,Lachnospiraceae bacterium MC2017 1, Butyrivibrio proteoclasticus,Peregrinibacteria bacterium GW2011_GWA2_33_10, Parcubacteria bacteriumGW2011_GWC2_44_17, Smithella sp. SCADC, Microgenomates, Acidaminococcussp. BV3L6, Lachnospiraceae bacterium MA2020, Candidatus Methanoplasmatermitum, Eubacterium eligens, Moraxella bovoculi 237, Leptospirainadai, Lachnospiraceae bacterium ND2006, Porphyromonas crevioricanis 3,Prevotella disiens, Porphyromonas macacae, Catenibacterium sp. CAG:290,Kandleria vitulina, Clostridiales bacterium KA00274, Lachnospiraceaebacterium 3-2, Dorea longicatena, Coprococcus catus GD/7, Enterococcuscolumbae DSM 7374, Fructobacillus sp. EFB-N1, Weissella halotolerans,Pediococcus acidilactici, Lactobacillus curvatus, Streptococcuspyogenes, Lactobacillus versmoldensis, and Filifactor alocis ATCC 35896.See, U.S. Pat. Nos. 8,697,359; 8,771,945; 8,795,965; 8,865,406;8,871,445; 8,889,356; 8,895,308; 8,906,616; 8,932,814; 8,945,839;8,993,233; 8,999,641; 9,822,372; 9,840,713; U.S. patent application Ser.No. 13/842,859 (US 2014/0068797 A1); U.S. Pat. Nos. 9,260,723;9,023,649; 9,834,791; 9,637,739; U.S. patent application Ser. No.14/683,443 (US 2015/0240261 A1); U.S. patent application Ser. No.14/743,764 (US 2015/0291961 A1); U.S. Pat. Nos. 9,790,490; 9,688,972;9,580,701; 9,745,562; 9,816,081; 9,677,090; 9,738,687; U.S. applicationSer. No. 15/632,222 (US 2017/0369879 A1); U.S. application Ser. No.15/631,989; U.S. application Ser. No. 15/632,001; and U.S. Pat. No.9,896,696, each of which is herein incorporated by reference.

In some embodiments, a Cas effector protein comprises one or more of thefollowing activities:

a nickase activity, i.e., the ability to cleave a single strand of anucleic acid molecule;

a double stranded nuclease activity, i.e., the ability to cleave bothstrands of a double stranded nucleic acid and create a double strandedbreak;

an endonuclease activity;

an exonuclease activity; and/or

a helicase activity, i.e., the ability to unwind the helical structureof a double stranded nucleic acid.

In aspects of the disclosure the term “guide nucleic acid” refers to apolynucleotide comprising 1) a guide sequence capable of hybridizing toa target sequence (referred to herein as a “targeting segment”) and 2) ascaffold sequence capable of interacting with (either alone or incombination with a tracrRNA molecule) a nucleic acid guided nuclease asdescribed herein (referred to herein as a “scaffold segment”). A guidenucleic acid can be DNA. A guide nucleic acid can be RNA. A guidenucleic acid can comprise both DNA and RNA. A guide nucleic acid cancomprise modified non-naturally occurring nucleotides. In cases wherethe guide nucleic acid comprises RNA, the RNA guide nucleic acid can beencoded by a DNA sequence on a polynucleotide molecule such as aplasmid, linear construct generated using the methods and compositionsprovided herein.

In some embodiments, the guide nucleic acids described herein are RNAguide nucleic acids (“guide RNAs” or “gRNAs”) and comprise a targetingsegment and a scaffold segment. In some embodiments, the scaffoldsegment of a gRNA is comprised in one RNA molecule and the targetingsegment is comprised in another separate RNA molecule. Such embodimentsare referred to herein as “double-molecule gRNAs” or “two-molecule gRNA”or “dual gRNAs.” In some embodiments, the gRNA is a single RNA moleculeand is referred to herein as a “single-guide RNA” or an “sgRNA.” Theterm “guide RNA” or “gRNA” is inclusive, referring both to two-moleculeguide RNAs and sgRNAs.

In one embodiment, an assembly comprising a pair of first and secondpolynucleotides with an insert polynucleotide located therebetweengenerated using the methods and compositions provided herein is a guideRNA (gRNA). In some cases, the methods provided herein are used togenerate a library of gRNAs.

The DNA-targeting segment of a gRNA comprises a nucleotide sequence thatis complementary to a sequence in a target nucleic acid sequence. Assuch, the targeting segment of a gRNA interacts with a target nucleicacid in a sequence-specific manner via hybridization (i.e., basepairing), and the nucleotide sequence of the targeting segmentdetermines the location within the target DNA that the gRNA will bind.The degree of complementarity between a guide sequence and itscorresponding target sequence, when optimally aligned using a suitablealignment algorithm, is about or more than about 50%, 60%, 75%, 80%,85%, 90%, 95%, 97.5%, 99%, or more. Optimal alignment may be determinedwith the use of any suitable algorithm for aligning sequences. In someembodiments, a guide sequence is about or more than about 5, 6, 7, 8, 9,10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27,28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 45, 50, 75, or morenucleotides in length. In some embodiments, a guide sequence is lessthan about 75, 50, 45, 40, 35, 30, 25, 20 nucleotides in length. Inaspects, the guide sequence is 10-30 nucleotides long. The guidesequence can be 15-20 nucleotides in length. The guide sequence can be15 nucleotides in length. The guide sequence can be 16 nucleotides inlength. The guide sequence can be 17 nucleotides in length. The guidesequence can be 18 nucleotides in length. The guide sequence can be 19nucleotides in length. The guide sequence can be 20 nucleotides inlength.

The scaffold segment of a guide RNA interacts with a one or more Caseffector proteins to form a ribonucleoprotein complex (referred toherein as a CRISPR-RNP or a RNP-complex). The guide RNA directs thebound polypeptide to a specific nucleotide sequence within a targetnucleic acid sequence via the above-described targeting segment. Thescaffold segment of a guide RNA comprises two stretches of nucleotidesthat are complementary to one another and which form a double strandedRNA duplex. Sufficient sequence within the scaffold sequence to promoteformation of a targetable nuclease complex may include a degree ofcomplementarity along the length of two sequence regions within thescaffold sequence, such as one or two sequence regions involved informing a secondary structure. In some cases, the one or two sequenceregions are comprised or encoded on the same polynucleotide. In somecases, the one or two sequence regions are comprised or encoded onseparate polynucleotides. Optimal alignment may be determined by anysuitable alignment algorithm, and may further account for secondarystructures, such as self-complementarity within either the one or twosequence regions. In some embodiments, the degree of complementaritybetween the one or two sequence regions along the length of the shorterof the two when optimally aligned is about or more than about 25%, 30%,40%, 50%, 60%, 70%, 80%, 90%, 95%, 97.5%, 99%, or higher. In someembodiments, at least one of the two sequence regions is about or morethan about 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,25, 30, 40, 50, or more nucleotides in length.

A scaffold sequence of a subject gRNA can comprise a secondarystructure. A secondary structure can comprise a pseudoknot region orstem-loop structure. In some examples, the compatibility of a guidenucleic acid and nucleic acid guided nuclease is at least partiallydetermined by sequence within or adjacent to the secondary structureregion of the guide RNA. In some cases, binding kinetics of a guidenucleic acid to a nucleic acid guided nuclease is determined in part bysecondary structures within the scaffold sequence. In some cases,binding kinetics of a guide nucleic acid to a nucleic acid guidednuclease is determined in part by nucleic acid sequence with thescaffold sequence.

A compatible scaffold sequence for a gRNA-Cas effector proteincombination can be found by scanning sequences adjacent to a native Casnuclease loci. In other words, native Cas nucleases can be encoded on agenome within proximity to a corresponding compatible guide nucleic acidor scaffold sequence.

Nucleic acid guided nucleases can be compatible with guide nucleic acidsthat are not found within the nucleases endogenous host. Such orthogonalguide nucleic acids can be determined by empirical testing. Orthogonalguide nucleic acids can come from different bacterial species or besynthetic or otherwise engineered to be non-naturally occurring.Orthogonal guide nucleic acids that are compatible with a common nucleicacid-guided nuclease can comprise one or more common features. Commonfeatures can include sequence outside a pseudoknot region. Commonfeatures can include a pseudoknot region. Common features can include aprimary sequence or secondary structure.

A guide nucleic acid can be engineered to target a desired targetsequence by altering the guide sequence such that the guide sequence iscomplementary to the target sequence, thereby allowing hybridizationbetween the guide sequence and the target sequence. A guide nucleic acidwith an engineered guide sequence can be referred to as an engineeredguide nucleic acid. Engineered guide nucleic acids are oftennon-naturally occurring and are not found in nature.

In some embodiments, the present disclosure provides a polynucleotideencoding a gRNA generated using the compositions and methods providedherein. In some embodiments, the composition comprising a pair of firstand second polynucleotides and an insert polynucleotide furthercomprises an expression vector such that assembly of the pair of firstand second polynucleotides with the insert polynucleotide and expressionvector generates an expression vector comprising a gRNA-encoding nucleicacid.

In another embodiment, an assembly comprising a pair of first and secondpolynucleotides with an insert polynucleotide located therebetweengenerated using the methods and compositions provided herein is a donorDNA sequence. In some cases, the methods provided herein are used togenerate a library of donor DNA sequences. The donor DNA sequence can beused in combination with a guide RNA (gRNA) in a CRISPR method of geneediting using homology directed repair (HDR). The CRISPR complex canresult in the strand breaks within the target gene(s) that can berepaired by using homology directed repair (HDR). HDR mediated repaircan be facilitated by co-transforming the host cell with a donor DNAsequence generated using the methods and compositions provided herein.The donor DNA sequence can comprise a desired genetic perturbation(e.g., deletion, insertion, and/or single nucleotide polymorphism) aswell as targeting sequences derived from the first and secondpolynucleotides. In this embodiment, the CRISPR complex cleaves thetarget gene specified by the one or more gRNAs. The donor DNA sequencecan then be used as a template for the homologous recombinationmachinery to incorporate the desired genetic perturbation into the hostcell. The donor DNA can be single-stranded, double-stranded or adouble-stranded plasmid. The donor DNA can lack a PAM sequence orcomprise a scrambled, altered or non-functional PAM in order to preventre-cleavage. In some cases, the donor DNA can contain a functional ornon-altered PAM site. The mutated or edited sequence in the donor DNA(also flanked by the regions of homology) prevents re-cleavage by theCRISPR-complex after the mutation(s) has/have been incorporated into thegenome.

Host Cells

As provided herein, the libraries of nucleic acid constructs generatedusing the compositions and/or methods provided herein can be used toedit or modify a genetic element (e.g., genome, cosmid or plasmid) of ahost cell or engineer the host cell via introducing (e.g., transformingor transducing) one or more genetic element(s) (e.g., plasmid or cosmid)into said host cell. The genomic engineering or editing methods can beapplicable to any organism where desired traits can be identified in apopulation of genetic mutants. The organism can be a microorganism orhigher eukaryotic organism.

Thus, as used herein, the term “microorganism” should be taken broadly.It includes, but is not limited to, the two prokaryotic domains,Bacteria and Archaea, as well as certain eukaryotic fungi and protists.However, in certain aspects, “higher” eukaryotic organisms such asinsects, plants, and animals can be utilized in the methods taughtherein.

Suitable host cells include, but are not limited to: bacterial cells,algal cells, plant cells, fungal cells, insect cells, and mammaliancells. In one illustrative embodiment, suitable host cells include E.coli (e.g., SHuffle™ competent E. coli available from New EnglandBioLabs in Ipswich, Mass.).

Other suitable host organisms of the present disclosure includemicroorganisms of the genus Corynebacterium. In some embodiments,preferred Corynebacterium strains/species include: C. efficiens, withthe deposited type strain being DSM44549, C. glutamicum, with thedeposited type strain being ATCC13032, and C. ammoniagenes, with thedeposited type strain being ATCC6871. In some embodiments, the preferredhost of the present disclosure is C. glutamicum.

Suitable host strains of the genus Corynebacterium, in particular of thespecies Corynebacterium glutamicum, are in particular the knownwild-type strains: Corynebacterium glutamicum ATCC13032, Corynebacteriumacetoglutamicum ATCC 15806, Corynebacterium acetoacidophilum ATCC 13870,Corynebacterium melassecola ATCC17965, Corynebacterium thermoaminogenesFERM BP-1539, Brevibacterium flavum ATCC14067, Brevibacteriumlactofermentum ATCC13869, and Brevibacterium divaricatum ATCC14020; andL-amino acid-producing mutants, or strains, prepared therefrom, such as,for example, the L-lysine-producing strains: Corynebacterium glutamicumFERM-P 1709, Brevibacterium flavum FERM-P 1708, Brevibacteriumlactofermentum FERM-P 1712, Corynebacterium glutamicum FERM-P 6463,Corynebacterium glutamicum FERM-P 6464, Corynebacterium glutamicumDM58-1, Corynebacterium glutamicum DG52-5, Corynebacterium glutamicumDSM5714, and Corynebacterium glutamicum DSM12866.

The term “Micrococcus glutamicus” has also been in use for C.glutamicum. Some representatives of the species C. efficiens have alsobeen referred to as C. thermoaminogenes in the prior art, such as thestrain FERM BP-1539, for example.

In some embodiments, the host cell of the present disclosure is aeukaryotic cell. Suitable eukaryotic host cells include, but are notlimited to: fungal cells, algal cells, insect cells, animal cells, andplant cells. Suitable fungal host cells include, but are not limited to:Ascomycota, Basidiomycota, Deuteromycota, Zygomycota, Fungi imperfecti.Certain preferred fungal host cells include yeast cells and filamentousfungal cells. Suitable filamentous fungi host cells include, forexample, any filamentous forms of the subdivision Eumycotina andOomycota. (see, e.g., Hawksworth et al., In Ainsworth and Bisby'sDictionary of The Fungi, 8^(th) edition, 1995, CAB International,University Press, Cambridge, UK, which is incorporated herein byreference). Filamentous fungi are characterized by a vegetative myceliumwith a cell wall composed of chitin, cellulose and other complexpolysaccharides. The filamentous fungi host cells are morphologicallydistinct from yeast.

In certain illustrative, but non-limiting embodiments, the filamentousfungal host cell may be a cell of a species of: Achlya, Acremonium,Aspergillus, Aureobasidium, Bjerkandera, Ceriporiopsis, Cephalosporium,Chrysosporium, Cochliobolus, Corynascus, Cryphonectria, Cryptococcus,Coprinus, Coriolus, Diplodia, Endothis, Fusarium, Gibberella,Gliocladium, Humicola, Hypocrea, Myceliophthora (e.g., Myceliophthorathermophila), Mucor, Neurospora, Penicillium, Podospora, Phlebia,Piromyces, Pyricularia, Rhizomucor, Rhizopus, Schizophyllum,Scytalidium, Sporotrichum, Talaromyces, Thermoascus, Thielavia,Trametes, Tolypocladium, Trichoderma, Verticillium, Volvariella, orteleomorphs, or anamorphs, and synonyms or taxonomic equivalentsthereof.

Suitable yeast host cells include, but are not limited to: Candida,Hansenula, Saccharomyces, Schizosaccharomyces, Pichia, Kluyveromyces,and Yarrowia. In some embodiments, the yeast cell is Hansenulapolymorpha, Saccharomyces cerevisiae, Saccaromyces carlsbergensis,Saccharomyces diastaticus, Saccharomyces norbensis, Saccharomyceskluyveri, Schizosaccharomyces pombe, Pichia pastoris, Pichia finlandica,Pichia trehalophila, Pichia kodamae, Pichia membranaefaciens, Pichiaopuntiae, Pichia thermotolerans, Pichia salictaria, Pichia quercuum,Pichia pijperi, Pichia stipitis, Pichia methanolica, Pichia angusta,Kluyveromyces lactis, Candida albicans, or Yarrowia lipolytica.

In certain embodiments, the host cell is an algal such as, Chlamydomonas(e.g., C. Reinhardtii) and Phormidium (P. sp. ATCC29409).

In other embodiments, the host cell is a prokaryotic cell. Suitableprokaryotic cells include gram positive, gram negative, andgram-variable bacterial cells. The host cell may be a species of, butnot limited to: Agrobacterium, Alicyclobacillus, Anabaena, Anacystis,Acinetobacter, Acidothermus, Arthrobacter, Azobacter, Bacillus,Bifidobacterium, Brevibacterium, Butyrivibrio, Buchnera, Campestris,Campylobacter, Clostridium, Corynebacterium, Chromatium, Coprococcus,Escherichia, Enterococcus, Enterobacter, Envinia, Fusobacterium,Faecalibacterium, Francisella, Flavobacterium, Geobacillus, Haemophilus,Helicobacter, Klebsiella, Lactobacillus, Lactococcus, Ilyobacter,Micrococcus, Microbacterium, Mesorhizobium, Methylobacterium,Mycobacterium, Neisseria, Pantoea, Pseudomonas, Prochlorococcus,Rhodobacter, Rhodopseudomonas, Roseburia, Rhodospirillum, Rhodococcus,Scenedesmus, Streptomyces, Streptococcus, Synechococcus,Saccharomonospora, Staphylococcus, Serratia, Salmonella, Shigella,Thermoanaerobacterium, Tropheryma, Tularensis, Temecula,Thermosynechococcus, Thermococcus, Ureaplasma, Xanthomonas, Xylella,Yersinia, and Zymomonas. In some embodiments, the host cell isCorynebacterium glutamicum.

In some embodiments, the bacterial host strain is an industrial strain.Numerous bacterial industrial strains are known and suitable in themethods and compositions described herein.

In some embodiments, the bacterial host cell is of the Agrobacteriumspecies (e.g., A. radiobacter, A. rhizogenes, A. rubi), the Arthrobacterspecies (e.g., A. aurescens, A. citreus, A. globformis, A.hydrocarboglutamicus, A. mysorens, A. nicotianae, A. paraffineus, A.protophonniae, A. roseoparaffinus, A. sulfureus, A. ureafaciens), theBacillus species (e.g., B. thuringiensis, B. anthracis, B. megaterium,B. subtilis, B. lentus, B. circulars, B. pumilus, B. lautus, B.coagulans, B. brevis, B. firmus, B. alkaophius, B. licheniformis, B.clausii, B. stearothermophilus, B. halodurans and B. amyloliquefaciens.In particular embodiments, the host cell will be an industrial Bacillusstrain including but not limited to B. subtilis, B. pumilus, B.licheniformis, B. megaterium, B. clausii, B. stearothermophilus and B.amyloliquefaciens. In some embodiments, the host cell will be anindustrial Clostridium species (e.g., C. acetobutylicum, C. tetani E88,C. lituseburense, C. saccharobutylicum, C. perfringens, C.beijerinckii). In some embodiments, the host cell will be an industrialCorynebacterium species (e.g., C. glutamicum, C. acetoacidophilum). Insome embodiments, the host cell will be an industrial Escherichiaspecies (e.g., E. coli). In some embodiments, the host cell will be anindustrial Envinia species (e.g., E. uredovora, E. carotovora, E.ananas, E. herbicola, E. punctata, E. terreus). In some embodiments, thehost cell will be an industrial Pantoea species (e.g., P. citrea, P.agglomerans). In some embodiments, the host cell will be an industrialPseudomonas species, (e.g., P. putida, P. aeruginosa, P. mevalonii). Insome embodiments, the host cell will be an industrial Streptococcusspecies (e.g., S. equisimiles, S. pyogenes, S. uberis). In someembodiments, the host cell will be an industrial Streptomyces species(e.g., S. ambofaciens, S. achromogenes, S. avermitilis, S. coelicolor,S. aureofaciens, S. aureus, S. fungicidicus, S. griseus, S. lividans).In some embodiments, the host cell will be an industrial Zymomonasspecies (e.g., Z. mobilis, Z. lipolytica), and the like.

In some embodiments, the host cell will be an industrial Escherichiaspecies (e.g., E. coli).

Suitable host strains of the E. coli species comprise: EnterotoxigenicE. coli (ETEC), Enteropathogenic E. coli (EPEC), Enteroinvasive E. coli(EIEC), Enterohemorrhagic E. coli (EHEC), Uropathogenic E. coli (UPEC),Verotoxin-producing E. coli, E. coli O157:H7, E. coli O104:H4,Escherichia coli O121, Escherichia coli O104:H21, Escherichia coli K1,and Escherichia coli NC101. In some embodiments, the present disclosureteaches genomic engineering of E. coli K12, E. coli B, and E. coli C.

In some embodiments, the host cell can be E. coli strains NCTC 12757,NCTC 12779, NCTC 12790, NCTC 12796, NCTC 12811, ATCC 11229, ATCC 25922,ATCC 8739, DSM 30083, BC 5849, BC 8265, BC 8267, BC 8268, BC 8270, BC8271, BC 8272, BC 8273, BC 8276, BC 8277, BC 8278, BC 8279, BC 8312, BC8317, BC 8319, BC 8320, BC 8321, BC 8322, BC 8326, BC 8327, BC 8331, BC8335, BC 8338, BC 8341, BC 8344, BC 8345, BC 8346, BC 8347, BC 8348, BC8863, and BC 8864.

In some embodiments, the present disclosure teaches host cells that canbe verocytotoxigenic E. coli (VTEC), such as strains BC 4734 (O26:H11),BC 4735 (O157:H-), BC 4736, BC 4737 (n.d.), BC 4738 (O157:H7), BC 4945(O26:H-), BC 4946 (O157:H7), BC 4947 (O111:H-), BC 4948 (O157:H), BC4949 (O5), BC 5579 (O157:H7), BC 5580 (O157:H7), BC 5582 (O3:H), BC 5643(O2:H5), BC 5644 (O128), BC 5645 (O55:H-), BC 5646 (O69:H-), BC 5647(O101:H9), BC 5648 (O103:H2), BC 5850 (O22:H8), BC 5851 (O55:H-), BC5852 (O48:H21), BC 5853 (O26:H11), BC 5854 (O157:H7), BC 5855 (O157:H-),BC 5856 (O26:H-), BC 5857 (O103:H2), BC 5858 (O26:H11), BC 7832, BC 7833(Oraw form:H-), BC 7834 (ONT:H-), BC 7835 (O103:H2), BC 7836 (O57:H-),BC 7837 (ONT:H-), BC 7838, BC 7839 (O128:H2), BC 7840 (O157:H-), BC 7841(O23:H-), BC 7842 (O157:H-), BC 7843, BC 7844 (O157:H-), BC 7845(O103:H2), BC 7846 (O26:H11), BC 7847 (O145:H-), BC 7848 (O157:H-), BC7849 (O156:H47), BC 7850, BC 7851 (O157:H-), BC 7852 (O157:H-), BC 7853(O5:H-), BC 7854 (O157:H7), BC 7855 (O157:H7), BC 7856 (O26:H-), BC7857, BC 7858, BC 7859 (ONT:H-), BC 7860 (O129:H-), BC 7861, BC 7862(O103:H2), BC 7863, BC 7864 (Oraw form:H-), BC 7865, BC 7866 (O26:H-),BC 7867 (Oraw form:H-), BC 7868, BC 7869 (ONT:H-), BC 7870 (O113:H-), BC7871 (ONT:H-), BC 7872 (ONT:H-), BC 7873, BC 7874 (Oraw form:H-), BC7875 (O157:H-), BC 7876 (O111:H-), BC 7877 (O146:H21), BC 7878(O145:H-), BC 7879 (O22:H8), BC 7880 (Oraw form:H-), BC 7881 (O145:H-),BC 8275 (O157:H7), BC 8318 (O55:K-:H-), BC 8325 (O157:H7), and BC 8332(ONT), BC 8333.

In some embodiments, the present disclosure teaches host cells that canbe enteroinvasive E. coli (EIEC), such as strains BC 8246 (O152:K-:H-),BC 8247 (O124:K(72):H3), BC 8248 (O124), BC 8249 (O112), BC 8250(O136:K(78):H-), BC 8251 (O124:H-), BC 8252 (O144:K-:H-), BC 8253(O143:K:H-), BC 8254 (O143), BC 8255 (O112), BC 8256 (O28a.e), BC 8257(O124:H-), BC 8258 (O143), BC 8259 (O167:K-:H5), BC 8260 (O128a.c.:H35),BC 8261 (O164), BC 8262 (O164:K-:H-), BC 8263 (O164), and BC 8264(O124).

In some embodiments, the present disclosure teaches host cells that canbe enterotoxigenic E. coli (ETEC), such as strains BC 5581 (O78:H11), BC5583 (O2:K1), BC 8221 (O118), BC 8222 (O148:H-), BC 8223 (O111), BC 8224(O110:H-), BC 8225 (O148), BC 8226 (O118), BC 8227 (O25:H42), BC 8229(O6), BC 8231 (O153:H45), BC 8232 (O9), BC 8233 (O148), BC 8234 (O128),BC 8235 (O118), BC 8237 (O111), BC 8238 (O110:H17), BC 8240 (O148), BC8241 (O6H16), BC 8243 (O153), BC 8244 (O15:H-), BC 8245 (O20), BC 8269(O125a.c:H-), BC 8313 (O6:H6), BC 8315 (O153:H-), BC 8329, BC 8334(O118:H12), and BC 8339.

In some embodiments, the present disclosure teaches host cells that canbe enteropathogenic E. coli (EPEC), such as strains BC 7567 (O86), BC7568 (O128), BC 7571 (O114), BC 7572 (O119), BC 7573 (O125), BC 7574(O124), BC 7576 (O127a), BC 7577 (O126), BC 7578 (O142), BC 7579 (O26),BC 7580 (OK26), BC 7581 (O142), BC 7582 (O55), BC 7583 (O158), BC 7584(O-), BC 7585 (O-), BC 7586 (O-), BC 8330, BC 8550 (O26), BC 8551 (O55),BC 8552 (O158), BC 8553 (O26), BC 8554 (O158), BC 8555 (O86), BC 8556(O128), BC 8557 (OK26), BC 8558 (O55), BC 8560 (O158), BC 8561 (O158),BC 8562 (O114), BC 8563 (O86), BC 8564 (O128), BC 8565 (O158), BC 8566(O158), BC 8567 (O158), BC 8568 (O111), BC 8569 (O128), BC 8570 (O114),BC 8571 (O128), BC 8572 (O128), BC 8573 (O158), BC 8574 (O158), BC 8575(O158), BC 8576 (O158), BC 8577 (O158), BC 8578 (O158), BC 8581 (O158),BC 8583 (O128), BC 8584 (O158), BC 8585 (O128), BC 8586 (O158), BC 8588(O26), BC 8589 (O86), BC 8590 (O127), BC 8591 (O128), BC 8592 (O114), BC8593 (O114), BC 8594 (O114), BC 8595 (O125), BC 8596 (O158), BC 8597(O26), BC 8598 (O26), BC 8599 (O158), BC 8605 (O158), BC 8606 (O158), BC8607 (O158), BC 8608 (O128), BC 8609 (O55), BC 8610 (O114), BC 8615(O158), BC 8616 (O128), BC 8617 (O26), BC 8618 (O86), BC 8619, BC 8620,BC 8621, BC 8622, BC 8623, BC 8624 (O158), and BC 8625 (O158).

In some embodiments, the present disclosure also teaches host cells thatcan be Shigella organisms, including Shigella flexneri, Shigelladysenteriae, boydii, and Shigella sonnei.

The present disclosure is also suitable for use with a variety of animalcell types, including mammalian cells, for example, human (including293, WI38, PER.C6 and Bowes melanoma cells), mouse (including 3T3, NS0,NS1, Sp2/0), hamster (CHO, BHK), monkey (COS, FRhL, Vero), and hybridomacell lines.

In various embodiments, strains that may be used in the practice of thedisclosure including both prokaryotic and eukaryotic strains, arereadily accessible to the public from a number of culture collectionssuch as American Type Culture Collection (ATCC), Deutsche Sammlung vonMikroorganismen and Zellkulturen GmbH (DSM), Centraalbureau VoorSchimmelcultures (CBS), and Agricultural Research Service Patent CultureCollection, Northern Regional Research Center (NRRL).

In some embodiments, the methods of the present disclosure are alsoapplicable to multi-cellular organisms. For example, the platform couldbe used for improving the performance of crops. The organisms cancomprise a plurality of plants such as Gramineae, Fetucoideae,Poacoideae, Agrostis, Phleum, Dactylis, Sorgum, Setaria, Zea, Oryza,Triticum, Secale, Avena, Hordeum, Saccharum, Poa, Festuca, Stenotaphrum,Cynodon, Coix, Olyreae, Phareae, Compositae or Leguminosae. For example,the plants can be corn, rice, soybean, cotton, wheat, rye, oats, barley,pea, beans, lentil, peanut, yam bean, cowpeas, velvet beans, clover,alfalfa, lupine, vetch, lotus, sweet clover, wisteria, sweet pea,sorghum, millet, sunflower, canola or the like. Similarly, the organismscan include a plurality of animals such as non-human mammals, fish,insects, or the like.

Transformation of Host Cells

In some embodiments, the constructs generated by the methods of thepresent disclosure may be introduced into the host cells using any of avariety of techniques, including transformation, transfection,transduction, viral infection, gene guns, or Ti-mediated gene transfer.Particular methods include calcium phosphate transfection, DEAE-Dextranmediated transfection, lipofection, or electroporation (Davis, L.,Dibner, M., Battey, I., 1986 “Basic Methods in Molecular Biology”).Other methods of transformation include for example, lithium acetatetransformation and electroporation See, e.g., Gietz et al., NucleicAcids Res. 27:69-74 (1992); Ito et al., J. Bacterol. 153:163-168 (1983);and Becker and Guarente, Methods in Enzymology 194:182-187 (1991). Insome embodiments, transformed host cells are referred to as recombinanthost strains.

Automation

In one embodiment, the compositions and methods provided herein areincorporated into a high-throughput (HTP) method for genetic engineeringof a host cell. In another embodiment, the methods provided herein canbe a molecular tool that is part of the suite of HTP molecular tool setsdescribed in PCT/US18/36360, PCT/US18/36333 or WO 2017/100377, each ofwhich is herein incorporated by reference, for all purposes, to createHTP genetic design libraries, which are derived from, inter alia,scientific insight and iterative pattern recognition. The compositionsand methods provided herein can be used to generate libraries for use inhigh-throughput methods such as those described in PCT/US18/36360,PCT/US18/36333 or WO 2017/100377. Examples of libraries that can begenerated using the methods provided herein can include, but are notlimited to promoter ladders, terminator ladders, solubility tag laddersor degradation tag ladders. Examples of high-throughput genomicengineering methods that can utilize the compositions and methodsprovided herein can include, but are not limited to, promoter swapping,terminator (stop) swapping, solubility tag swapping, degradation tagswapping or SNP swapping as described in PCT/US18/36360, PCT/US18/36333or WO 2017/100377. The high-throughput methods can be automated and/orutilize robotics and liquid handling platforms (e.g., plate roboticsplatform and liquid handling machines known in the art. Thehigh-throughput methods can utilize multi-well plates such as, forexample microtiter plates.

In some embodiments, the automated methods of the disclosure comprise arobotic system. The systems outlined herein are generally directed tothe use of 96- or 384-well microtiter plates, but as will be appreciatedby those in the art, any number of different plates or configurationsmay be used. In addition, any or all of the steps outlined herein may beautomated; thus, for example, the systems may be completely or partiallyautomated. The robotic systems compatible with the methods andcompositions provided herein can be those described in PCT/US18/36360,PCT/US18/36333 or WO 2017/100377.

Kits

Also provided by the present disclosure are kits for practicing themethods for generating nucleic acid assemblies or libraries derivedtherefrom as described above. The kit can comprise a mixture containingall of the reagents necessary for assembling ssDNA molecules (e.g.,oligonucleotides) or dsDNA molecules. In certain embodiments, a subjectkit may contain: i. a pool of first polynucleotides containing pairs offirst and second polynucleotides, ii. a second pool of insertpolynucleotides, and (iii) optionally, a suitable cloning vector forpropagation of the generated assemblies in a suitable host cell. In somecases, the kit includes a positive control.

In one embodiment, the kits provided herein further comprise a 5′-3′exonuclease, and a strand-displacing polymerase. In another embodiment,the kits provided herein further comprise a 5′-3′ exonuclease, a ligaseand a strand-displacing polymerase. In a still further embodiment, thekits provided herein comprise a single-stranded (ss) binding protein.The ss binding protein can be an extreme thermostable single-strandedDNA binding protein (ET SSB), E. coli recA, T7 gene 2.5 product, phagelambda RedB or Rac prophage RecT.

In a separate embodiment, the kits provided herein further comprise a 5′to 3′ exonuclease that lacks 3′ exonuclease activity, a crowding agent,a thermostable non-strand-displacing DNA polymerase with 3′ exonucleaseactivity, or a mixture of said DNA polymerase with a second DNApolymerase that lacks 3′ exonuclease activity, and an isolatedthermostable ligase, in appropriate amounts. The crowding agent can PEG,dextran or ficoll. For example, the kit may contain T5 exonuclease, PEG,PHUSION®. DNA polymerase, and Taq ligase. In another example, the kitcomprises: Exonuclease III, PEG, AMPLITAQ GOLD® DNA polymerase, and Taqligase.

Any of the kits provided herein may also contain other reagentsdescribed above and below that may be employed in the method, e.g., amismatch repair enzyme such as mutHLS, cel-1 nuclease, T7 endo 1, uvrD,T4 EndoVII, E. coli EndoV, a buffer, dNTPs, plasmids into which toinsert the synthon and/or competent cells to receive the plasmids,controls etc., depending on how the method is going to be implemented.

The components of the kit may be combined in one container, or eachcomponent may be in its own container. For example, the components ofthe kit may be combined in a single reaction tube or in one or moredifferent reaction tubes.

In addition to above-mentioned components, the subject kit furtherincludes instructions for using the components of the kit to practicethe subject method. The instructions for practicing the subject methodare generally recorded on a suitable recording medium. For example, theinstructions may be printed on a substrate, such as paper or plastic,etc. As such, the instructions may be present in the kits as a packageinsert, in the labeling of the container of the kit or componentsthereof (i.e., associated with the packaging or subpackaging) etc. Inother embodiments, the actual instructions are not present in the kit,but means for obtaining the instructions from a remote source, e.g. viathe internet, are provided. An example of this embodiment is a kit thatincludes a web address where the instructions can be viewed and/or fromwhich the instructions can be downloaded.

Compositions, kits and methods for assembling pairs of polynucleotidesin a first pool and insert polynucleotides in a second pool as describedherein result in a product that is a dsDNA that can serve as a templatefor PCR, RCA or a variety of other molecular biology applicationsincluding direct transformation or transfection of a competentprokaryotic or eukaryotic host cell.

EXAMPLES

The present disclosure is further illustrated by reference to thefollowing Examples. However, it should be noted that these Examples,like the embodiments described above, are illustrative and are not to beconstrued as restricting the scope of the disclosure in any way.

Example 1—Proof of Principle of Method for Multiplexed Assembly of DNAFragments into Deterministic Library of Plasmids

Objective

This example describes the use of an in vitro assembly reaction as shownschematically in FIG. 1 to deterministically join a pool comprisingprecisely designed DNA parts comprising 4 components to generate alibrary of desired plasmids.

Methods/Results

Insert sequences (i.e., payloads) were synthesized on either a column oran array to generate a pool of column-synthesized payloads and a pool ofarray-synthesized payloads, each containing a mixture of the 7 payloadsequences shown below.

>pMB070_promoter (SEQ ID NO. 1)ACCGTGCGTGTTGACAATTTTACCTCTGGCGGTGATACTGGTTGCATGTACTAAGGAGGTTGT >b2405_promoter (SEQ ID NO. 2)ATGTCGGATATCTGGTGGTGAAATACTTTATGCCATGATAATTTAATACGATGTATTTATTATATGGAGCACTTAATT >b0605_promoter (SEQ ID NO. 3)TAATGGAAACGCATTAGCCGAATCGGCAAAAATTGGTTACCTTACATCTCATCGAAAACACGGAGGAAGTATAG >pMB043_promoter (SEQ ID NO. 4)ACCGTGCGTGTTGACTATTTTACCTCTGGCGGTTAGAGTTAACATCCTACAAGGAGAACAAAAGC >pMB071_promoter (SEQ ID NO. 5)ACCGTGCGTGTTGACTTAAATACCACTGGCGGTGATAATGGTTGCATGTACTAAGGAGGTTGT >b0159_promoter (SEQ ID NO. 6)CTCTCCCGCGTGAGAAATACGCTTCCCCGTAAGCGCATGGTAAACTATGCCTTCAAATCGGGCTTATCGCGAGTAAATCT >pMB090_promoter (SEQ ID NO. 7)ACCGTGCGTGTTTACAATTTTACCTCTGGCGGTGATAATTAACATCCTA CAAGGAGAACAAAAGC

Separately, 6 pools comprising pairs of left and right homology armswere generated (i.e., loci pool numbers referenced in FIG. 4). Each locipool contained a number of homology arms that each comprised sequencecomplementary to a separate locus in the genome of a Escherichia colihost cell; the number of unique loci (i.e., number of pairs of homologyarms) is given below the plot for each pool in FIG. 4 (i.e., ‘Loci inPool’ in Table below graph). The sequences of each homology arm in eachpair from each pool are SEQ ID NOs 8-179.

The pool of column-synthesized payloads and the pool ofarray-synthesized payloads were each separately mixed with the specificloci pool designated in FIG. 4. It should be noted that each mixturecontained a molar ratio of left homology arm:payload:right homology armwas roughly 1:10:1. There was an excess of the payload because thepayload pools contained oligonucleotides corresponding to 100-500 uniqueleft and right homology arms, while only 10-19 homology arms wereassembled in a given reaction given that a certain fraction of insertoligonuclotides would be inert in the reaction. The mixture furthercomprised the NEB Hifi DNA assembly master mix and a cloning vector forpropagation in an E. coli cloning strain. Each mixture contained about0.05 pmol of the respective loci pool, 0.2-1 pmol of the respectivepayload pool, and 0.0125 pmol of the cloning vector, theoreticallyresulting in 0.0125 pmol of final assembled product. Once assembled,each of the mixtures was subjected to the NEB Hifi DNA Assembly protocolfor in vitro overlap assembly and propagated in an E. coli cloningstrain. The number of unique loci (pairs of homology arms), payloads,and total possible constructs in each library is given in the table inFIG. 4.

Following propagation, at least 100 colonies per assembly were pickedseparately into liquid culture and grown overnight. The liquid culturewas used as template for a rolling-circle amplification (RCA) of theentire cloned plasmid. The RCA product was fragmented using a Tn5transposase fragmentation and adapter-ligation kit (Nextera, Illumina).Sample-specific indexing barcodes were then added via PCR, and plasmidsfrom the libraries were the pooled and column purified. Library molaritywas assessed with a Tapestation instrument and plasmids from thelibraries were loaded onto a MiSeq instrument (300 cycle kit) forsequencing. The number of plasmids sequenced for each assembly in shownin FIG. 4.

To determine what assembly had been generated in the pool of assemblies,algorithms were used to search for unique 20-mer sequences for eachjunction between parts in each unique assembly in the raw sequencingreads in order to identify which DNA sequence was assembled. The readswere then mapped to the corresponding reference sequence for each samplein order to determine the full-length products created. In the plot inFIG. 4, ‘sequence-perfect’ means that all four parts (vector backbone,left and right homology arms, and payload) were assembled together andthere were no mutations in the plasmid. ‘Correct assembly withmutations’ indicates the presence of all four parts in the correctarrangement but with one or more point mutations in the plasmid.‘Misassembly’ indicates plasmids with mispaired homology arms, orpart(s) or portions of parts not present.

Conclusion

The results shown in FIG. 4 indicate that the process depicted in FIG. 1can be successfully utilized to generate deterministic libraries of DNAassemblies.

Example 2—Proof of Principle of Method for Multiplexed DeterministicAssembly with Large Payloads Using Circular-Permuted Payload

Objective

This example describes the use of an in vitro assembly reaction as shownschematically in FIG. 1 to deterministically join a pool comprisingprecisely designed DNA parts including a circular-permuted payload(insert) whose preparation is described in FIG. 3.

Methods/Results

Insert was prepared by amplifying a template comprising the payloadsequence (˜2670 bp) using a pooled forward primer comprising, from 5′ to3′ end, an assembly overlap to the right side of the payload, 53 bp ofHOM2, an I-SceI restriction endonuclease recognition site, 53 bp ofHOM1, and a primer binding site at the left side of the payload. Theamplification product was excised from an agarose gel using a Qiagen gelextraction kit. The excised product was circularized in an NEB HiFiassembly reaction, purified via paramagnetic bead-based clean-up (e.g.,an AXYPREP mag bead cleanup), and linearized using I-SceI (the ‘circularpermutation’).

Separately, left and right homology arms that each comprised sequencecomplementary to a separate locus in the genome of a Saccharomycescerevisiae host cell were amplified from genomic DNA.

The circular-permuted pooled payload and the set of left and righthomology arms were combined with a cloning vector and assembled using anNEB HiFi reaction. The mixture contained a molar ratio of left homologyarms:pooled insert:right homology arms of roughly 1:5:1. An excess ofthe insert was used because the pooled insert contained 54 uniquesequences compared to the ten pairs of left/right homology arms used inthe assembly. The mixture contained about 16 fmol of the hom arm pools,80 fmol of the payload pool, and 2.5 fmol of the cloning vector,theoretically resulting in 2.5 fmol of final assembled product. Onceassembled, the mixture was subjected to the NEB Hifi DNA Assemblyprotocol for in vitro overlap assembly and propagated in an E. colicloning strain.

Following propagation, several colonies were picked separately intoliquid culture and grown overnight. The liquid culture was used astemplate for a rolling-circle amplification (RCA) of the entire clonedplasmid. The RCA product was sequenced with two primers (one forward andone reverse) by Sanger sequencing. The primers were designed to bind onthe cloning vector and read into the hom arms and payload.

To determine what assembly had been generated in the pool of assemblies,algorithms were used to search for unique 20-mer sequences in eachunique assembly in the sequencing reads. The reads were then aligned tothe corresponding reference sequence for each sample in order to verifythe intended junctions were created, indicating the plasmid hadassembled correctly. In FIG. 5, the long bars at the top indicate thestructure of the plasmids to be assembled in the pool, and the shorterbars below represent Sanger sequences aligned to the correspondingreference sequence for three separate samples from the pool ofassemblies. Faint vertical lines at the inner ends of the readsrepresent expected sequencing artifacts at the tail ends of Sangerreads. The data indicate that all of the intended junctions wereassembled.

Conclusion

The results shown in FIG. 5 indicate that the processes depicted in FIG.1 and FIG. 3 can be successfully utilized to generate deterministiclibraries of DNA assemblies incorporating long payloads (e.g., >200 bp).

Example 3—Proof of Principle of Method for Multiplexed DeterministicAssembly with Large Payloads Using PCR-Amplified Payload

Methods/Results

FIG. 6 shows total success rates for pooled assemblies for which payloadcontaining parts were created via PCR using primers that append theassembly overlaps from templates derived from a host genome. Amplifiedpayloads ranged from 182-213 bp in length. PCR-amplified parts withappended 25 bp assembly overlaps were purified via magnetic bead-basedprotocol and subsequently normalized along with left and right partscontaining homology arms to 0.25 picomoles total for payload, left andright parts separately and 0.05 picomoles of vector backbone beforeperforming assembly using NEB's HIFI assembly master-mix andsubsequently electroporated into electrocompetent cells. Success rateswere calculated from the percentage of plasmids that were recovered andpassed NGS-QC relative to those that were attempted to be created.Success rates were based on recovering and sequencing the followingnumbers of unique plasmids for each pool: pool 1: (48/70 plasmids), pool2: (47/70 plasmids), pool 3: (46/70 plasmids), pool 4 (56/70 plasmids),pool 5: (37/49 plasmids). For pools 1:4, 7 promoter payloads weretargeted to 10 Loci, and for pool 5 7 promoter payloads were targeted to7 loci. Collectively across all five pools we 234/329 or 71.12% ofplasmids were created and NGS-confirmed.

Conclusion

The results shown in FIG. 6 indicate that the process depicted in FIG. 1can be successfully utilized with PCR-amplified insert fragments togenerate deterministic libraries of DNA assemblies incorporating longpayloads (e.g. >200 bp).

SEQUENCES OF THE DISCLOSURE WITH SEQ ID NO IDENTIFIERS NUCLEIC ACID SEQName ID NO: Description pMB070_promoter 1 Payload sequenceb2405_promoter 2 Payload sequence b0605_promoter 3 Payload sequencepMB043_promoter 4 Payload sequence pMB071_promoter 5 Payload sequenceb0159_promoter 6 Payload sequence pMB090_promoter 7 Payload sequencepool_1_b3748_left 8 Pool 1-Left Homology Arm pool_1_b3748_right 9 Pool1-Right Homology Arm pool_1_b0388_left 10 Pool 1-Left Homology Armpool_1_b0388_right 11 Pool 1-Right Homology Arm pool_1_b4348_left 12Pool 1-Left Homology Arm pool_1_b4348_right 13 Pool 1-Right Homology Armpool_1_b1982_left 14 Pool 1-Left Homology Arm pool_1_b1982_right 15 Pool1-Right Homology Arm pool_1_b4367_left 16 Pool 1-Left Homology Armpool_1_b4367_right 17 Pool 1-Right Homology Arm pool_1_b2285_left 18Pool 1-Left Homology Arm pool_1_b2285_right 19 Pool 1-Right Homology Armpool_1_b2405_left 20 Pool 1-Left Homology Arm pool_1_b2405_right 21 Pool1-Right Homology Arm pool_1_b0495_left 22 Pool 1-Left Homology Armpool_1_b0495_right 23 Pool 1-Right Homology Arm pool_1_b1646_left 24Pool 1-Left Homology Arm pool_1_b1646_right 25 Pool 1-Right Homology Armpool_1_b3189_left 26 Pool 1-Left Homology Arm pool_1_b3189_right 27 Pool1-Right Homology Arm pool_2_b3125_left 28 Pool 2-Left Homology Armpool_2_b3125_right 29 Pool 2-Right Homology Arm pool_2_b3787_left 30Pool 2-Left Homology Arm pool_2_b3787_right 31 Pool 2-Right Homology Armpool_2_b1948_left 32 Pool 2-Left Homology Arm pool_2_b1948_right 33 Pool2-Right Homology Arm pool_2_b2790_left 34 Pool 2-Left Homology Armpool_2_b2790_right 35 Pool 2-Right Homology Arm pool_2_b3197_left 36Pool 2-Left Homology Arm pool_2_b3197_right 37 Pool 2-Right Homology Armpool_2_b3791_left 38 Pool 2-Left Homology Arm pool_2_b3791_right 39 Pool2-Right Homology Arm pool_2_b4260_left 40 Pool 2-Left Homology Armpool_2_b4260_right 41 Pool 2-Right Homology Arm pool_2_b0071_left 42Pool 2-Left Homology Arm pool_2_b0071_right 43 Pool 2-Right Homology Armpool_2_b1687_left 44 Pool 2-Left Homology Arm pool_2_b1687_right 45 Pool2-Right Homology Arm pool_2_b1006_left 46 Pool 2-Left Homology Armpool_2_b1006_right 47 Pool 2-Right Homology Arm pool_3_b0335_left 48Pool 3-Left Homology Arm pool_3_b0335_right 49 Pool 3-Right Homology Armpool_3_b1940_left 50 Pool 3-Left Homology Arm pool_3_b1940_right 51 Pool3-Right Homology Arm pool_3_b0109_left 52 Pool 3-Left Homology Armpool_3_b0109_right 53 Pool 3-Right Homology Arm pool_3_b3399_left 54Pool 3-Left Homology Arm pool_3_b3399_right 55 Pool 3-Right Homology Armpool_3_b2478_left 56 Pool 3-Left Homology Arm pool_3_b2478_right 57 Pool3-Right Homology Arm pool_3_b0320_left 58 Pool 3-Left Homology Armpool_3_b0320_right 59 Pool 3-Right Homology Arm pool_3_b4521_left 60Pool 3-Left Homology Arm pool_3_b4521_right 61 Pool 3-Right Homology Armpool_3_b2260_left 62 Pool 3-Left Homology Arm pool_3_b2260_right 63 Pool3-Right Homology Arm pool_3_b4169_left 64 Pool 3-Left Homology Armpool_3_b4169_right 65 Pool 3-Right Homology Arm pool_3_b2405_left 66Pool 3-Left Homology Arm pool_3_b2405_right 67 Pool 3-Right Homology Armpool_4_b3493_left 68 Pool 4-Left Homology Arm pool_4_b3493_right 69 Pool4-Right Homology Arm pool_4_b0479_left 70 Pool 4-Left Homology Armpool_4_b0479_right 71 Pool 4-Right Homology Arm pool_4_b2470_left 72Pool 4-Left Homology Arm pool_4_b2470_right 73 Pool 4-Right Homology Armpool_4_b1451_left 74 Pool 4-Left Homology Arm pool_4_b1451_right 75 Pool4-Right Homology Arm pool_4_b1981_left 76 Pool 4-Left Homology Armpool_4_b1981_right 77 Pool 4-Right Homology Arm pool_4_b0237_left 78Pool 4-Left Homology Arm pool_4_b0237_right 79 Pool 4-Right Homology Armpool_4_b2497_left 80 Pool 4-Left Homology Arm pool_4_b2497_right 81 Pool4-Right Homology Arm pool_4_b4260_left 82 Pool 4-Left Homology Armpool_4_b4260_right 83 Pool 4-Right Homology Arm pool_4_b1412_left 84Pool 4-Left Homology Arm pool_4_b1412_right 85 Pool 4-Right Homology Armpool_4_b4139_left 86 Pool 4-Left Homology Arm pool_4_b4139_right 87 Pool4-Right Homology Arm pool_4_b2039_left 88 Pool 4-Left Homology Armpool_4_b2039_right 89 Pool 4-Right Homology Arm pool_4_b4473_left 90Pool 4-Left Homology Arm pool_4_b4473_right 91 Pool 4-Right Homology Armpool_4_b3510_left 92 Pool 4-Left Homology Arm pool_4_b3510_right 93 Pool4-Right Homology Arm pool_4_b1007_left 94 Pool 4-Left Homology Armpool_4_b1007_right 95 Pool 4-Right Homology Arm pool_4_b3058_left 96Pool 4-Left Homology Arm pool_4_b3058_right 97 Pool 4-Right Homology Armpool_4_b2688_left 98 Pool 4-Left Homology Arm pool_4_b2688_right 99 Pool4-Right Homology Arm pool_4_b1716_left 100 Pool 4-Left Homology Armpool_4_b1716_right 101 Pool 4-Right Homology Arm pool_4_b3071_left 102Pool 4-Left Homology Arm pool_4_b3071_right 103 Pool 4-Right HomologyArm pool_4_b2139_left 104 Pool 4-Left Homology Arm pool_4_b2139_right105 Pool 4-Right Homology Arm pool_5_b2434_left 106 Pool 5-Left HomologyArm pool_5_b2434_right 107 Pool 5-Right Homology Arm pool_5_b2037_left108 Pool 5-Left Homology Arm pool_5_b2037_right 109 Pool 5-RightHomology Arm pool_5_b2451_left 110 Pool 5-Left Homology Armpool_5_b2451_right 111 Pool 5-Right Homology Arm pool_5_b1902_left 112Pool 5-Left Homology Arm pool_5_b1902_right 113 Pool 5-Right HomologyArm pool_5_b4310_left 114 Pool 5-Left Homology Arm pool_5_b4310_right115 Pool 5-Right Homology Arm pool_5_b0676_left 116 Pool 5-Left HomologyArm pool_5_b0676_right 117 Pool 5-Right Homology Arm pool_5_b1497_left118 Pool 5-Left Homology Arm pool_5_b1497_right 119 Pool 5-RightHomology Arm pool_5_b0183_left 120 Pool 5-Left Homology Armpool_5_b0183_right 121 Pool 5-Right Homology Arm pool_5_b3631_left 122Pool 5-Left Homology Arm pool_5_b3631_right 123 Pool 5-Right HomologyArm pool_5_b3791_left 124 Pool 5-Left Homology Arm pool_5_b3791_right125 Pool 5-Right Homology Arm pool_5_b0438_left 126 Pool 5-Left HomologyArm pool_5_b0438_right 127 Pool 5-Right Homology Arm pool_5_b1981_left128 Pool 5-Left Homology Arm pool_5_b1981_right 129 Pool 5-RightHomology Arm pool_5_b1709_left 130 Pool 5-Left Homology Armpool_5_b1709_right 131 Pool 5-Right Homology Arm pool_5_b2176_left 132Pool 5-Left Homology Arm pool_5_b2176_right 133 Pool 5-Right HomologyArm pool_5_b2168_left 134 Pool 5-Left Homology Arm pool_5_b2168_right135 Pool 5-Right Homology Arm pool_5_b1872_left 136 Pool 5-Left HomologyArm pool_5_b1872_right 137 Pool 5-Right Homology Arm pool_5_b1203_left138 Pool 5-Left Homology Arm pool_5_b1203_right 139 Pool 5-RightHomology Arm pool_5_b2231_left 140 Pool 5-Left Homology Armpool_5_b2231_right 141 Pool 5-Right Homology Arm pool_5_b1622_left 142Pool 5-Left Homology Arm pool_5_b1622_right 143 Pool 5-Right HomologyArm pool_6_b1857_left 144 Pool 6-Left Homology Arm pool_6_b1857_right145 Pool 6-Right Homology Arm pool_6_b4024_left 146 Pool 6-Left HomologyArm pool_6_b4024_right 147 Pool 6-Right Homology Arm pool_6_b3942_left148 Pool 6-Left Homology Arm pool_6_b3942_right 149 Pool 6-RightHomology Arm pool_6_b0592_left 150 Pool 6-Left Homology Armpool_6_b0592_right 151 Pool 6-Right Homology Arm pool_6_b1415_left 152Pool 6-Left Homology Arm pool_6_b1415_right 153 Pool 6-Right HomologyArm pool_6_b1762_left 154 Pool 6-Left Homology Arm pool_6_b1762_right155 Pool 6-Right Homology Arm pool_6_b3414_left 156 Pool 6-Left HomologyArm pool_6_b3414_right 157 Pool 6-Right Homology Arm pool_6_b4374_left158 Pool 6-Left Homology Arm pool_6_b4374_right 159 Pool 6-RightHomology Arm pool_6_b2917_left 160 Pool 6-Left Homology Armpool_6_b2917_right 161 Pool 6-Right Homology Arm pool_6_b0346_left 162Pool 6-Left Homology Arm pool_6_b0346_right 163 Pool 6-Right HomologyArm pool_6_b3966_left 164 Pool 6-Left Homology Arm pool_6_b3966_right165 Pool 6-Right Homology Arm pool_6_b0406_left 166 Pool 6-Left HomologyArm pool_6_b0406_right 167 Pool 6-Right Homology Arm pool_6_b0652_left168 Pool 6-Left Homology Arm pool_6_b0652_right 169 Pool 6-RightHomology Arm pool_6_b1493_left 170 Pool 6-Left Homology Armpool_6_b1493_right 171 Pool 6-Right Homology Arm pool_6_b4159_left 172Pool 6-Left Homology Arm pool_6_b4159_right 173 Pool 6-Right HomologyArm pool_6_b3795_left 174 Pool 6-Left Homology Arm pool_6_b3795_right175 Pool 6-Right Homology Arm pool_6_b4246_left 176 Pool 6-Left HomologyArm pool_6_b4246_right 177 Pool 6-Right Homology Arm pool_6_b4440_left178 Pool 6-Left Homology Arm pool_6_b4440_right 179 Pool 6-RightHomology Arm

Numbered Embodiments of the Disclosure

Other subject matter contemplated by the present disclosure is set outin the following numbered embodiments:

1. A composition comprising a mixture of polynucleotides, the mixturecomprising:

a first pool containing pairs of polynucleotides, wherein each pair inthe first pool contains a first polynucleotide and a secondpolynucleotide; and

a second pool of insert polynucleotides, wherein each insertpolynucleotide in the second pool comprises a first assembly overlapsequence at its 5′ end that is complementary to a 3′ end of a firstpolynucleotide and a second assembly overlap sequence at its opposing 3′end that is complementary to a 5′ end of a second polynucleotide in apair of polynucleotides from the first pool.

2. The composition of embodiment 1, further comprising a cloning vector,wherein, for each pair in the first pool, a 5′ end of the firstpolynucleotide and a 3′ end of the second polynucleotide comprisessequence complementary to the cloning vector.

3. The composition of embodiment 2, wherein each polynucleotide from thefirst pool is selected such that no polynucleotide from the first poolshares common sequence with any other polynucleotide from the first poolbeyond a specified threshold, excluding designed assembly overlapsequences between the pairs of polynucleotides of the first pool and theinsert polynucleotides of the second pool, or the pairs ofpolynucleotides of the first pool and the cloning vector.

4. The composition of embodiment 3, wherein the specified threshold isbetween 5 and 15 contiguous nucleotides.

5. The composition of any one of embodiments 1-4, further comprising apolymerase.

6. The composition of embodiment 5, wherein the polymerase isstrand-displacing or non-strand displacing.

7. The composition of embodiment 6, wherein the polymerase is non-stranddisplacing and the composition further comprises a crowding agent.

8. The composition of embodiment 7, wherein the crowding agent ispolyethylene glycol (PEG).

9. The composition of embodiment 8, wherein the PEG is used at aconcentration of from about 3 to about 7% (weight/volume).

10. The composition of embodiment 8 or 9, wherein the PEG is selectedfrom PEG-200, PEG-4000, PEG-6000, PEG-8000 or PEG-20,000.

11. The composition of embodiment 6, wherein the polymerase is stranddisplacing and the composition further comprises a single-strandedbinding protein.

12. The composition of embodiment 11, wherein the single strand DNAbinding protein is an extreme thermostable single-stranded DNA bindingprotein (ET SSB), E. coli recA, T7 gene 2.5 product, phage lambda RedBor Rac prophage RecT.

13. The composition of any one of the above embodiments, furthercomprising a 5′-3′ exonuclease.

14. The composition of any one of the above embodiments, furthercomprising a ligase.

15. The composition of any one of the above embodiments, wherein eachpair in the first pool is double-stranded DNA (dsDNA) or single-strandedDNA (ssDNA).

16. The composition of any one of the above embodiments, wherein eachinsert polynucleotide in the second pool is dsDNA or ssDNA.

17. The composition of any one of the above embodiments, wherein, foreach pair in the first pool, the first polynucleotide and the secondpolynucleotide comprises sequence corresponding to a target genomiclocus in a host cell.

18. The composition of any one of embodiments 1-16, wherein, for eachpair in the first pool, the first polynucleotide and the secondpolynucleotide comprise coding sequence corresponding to a gene that ispart of a metabolic pathway.

19. The composition of any one of embodiments 1-18, wherein, for eachpair in the first pool, the first polynucleotide and the secondpolynucleotide comprise coding sequence corresponding to a functionaldomain or one or more proteins.

20. The composition of any one of the above embodiments, wherein, foreach pair in the first pool, the first polynucleotide and the secondpolynucleotide are linked together in a single construct, wherein thesingle construct comprises one or more recognition sequences for one ormore site-specific nucleases between the first polynucleotide and thesecond polynucleotide.

21. The composition of embodiment 20, wherein the one or morerecognition sequences for one or more site-specific nucleases comprise ahoming endonuclease recognition sequence.

22. The composition of any one of the above embodiments, wherein thefirst assembly overlap sequence and the second assembly overlap sequenceon each insert polynucleotide in the second pool comprises 1 or morenucleotides that are complementary to the 3′ end of a firstpolynucleotide and the 5′ end of a second polynucleotide, respectively,in a pair of polynucleotides from the first pool.

23. The composition of any one of the above embodiments, wherein thefirst assembly overlap sequence and the second assembly overlap sequenceon each insert polynucleotide in the second pool comprises about 25nucleotides that are complementary to the 3′ end of a firstpolynucleotide and the 5′ end of a second polynucleotide, respectively,in a pair of polynucleotides from the first pool.

24. The composition of any one of the above embodiments, wherein eachinsert polynucleotide in the second pool comprises one or more payloadsequences located between the first assembly overlap sequence and thesecond assembly overlap sequence.

25. The composition of embodiment 24, wherein the one or more payloadsequences are selected from promoters, genes, regulatory sequences,nucleic acid sequence encoding degrons, nucleic acid sequence encodingsolubility tags, terminators, unique identifier sequence or portionsthereof.

26. The composition of embodiment 17, wherein each pair of first andsecond polynucleotides in the first pool comprises sequencecorresponding to a different target genomic locus in a host cell ascompared to each other pair in the first pool.

27. The composition of embodiment 17, wherein, for each pair in thefirst pool, the first polynucleotide and the second polynucleotidecomprises sequence corresponding to the same target genomic locus in ahost cell.

28. The composition of any one of embodiments 24-27, wherein eachpayload sequence in the insert polynucleotides in the second pool isdifferent from the payload sequence in each other insert polynucleotidein the second pool.

29. The composition of any one of embodiments 24-27, wherein eachpayload sequence in the insert polynucleotides in the second pool is thesame as the payload sequence in each other insert polynucleotide in thesecond pool.

30. A method for generating libraries of polynucleotides, the methodcomprising:

(a) combining a first pool of polynucleotides and a second pool ofpolynucleotides, wherein the first pool contains pairs ofpolynucleotides, wherein each pair in the first pool contains a firstpolynucleotide and a second polynucleotide, wherein the second poolcontains insert polynucleotides, wherein each insert polynucleotide inthe second pool comprises a first assembly overlap sequence at its 5′end that is complementary to a 3′ end of a first polynucleotide and asecond assembly overlap sequence at its opposing 3′ end that iscomplementary to a 5′ end of a second polynucleotide in a pair ofpolynucleotides from the first pool; and

(b) assembling the first pool and the second pool into a library ofpolynucleotides, wherein each polynucleotide in the library comprises aninsert polynucleotide from the second pool and a pair of firstpolynucleotides and second polynucleotides from the first pool, whereinthe assembling is performed via in vitro cloning methods or in vivocloning methods.

31. The method of embodiment 30, wherein the first assembly overlapsequence and the second assembly overlap sequence on each insertpolynucleotide in the second pool comprises 1 or more nucleotides thatare complementary to the 3′ end of a first polynucleotide and the 5′ endof a second polynucleotide, respectively, in a pair of polynucleotidesfrom the first pool.

32. The method of embodiment 30 or 31, wherein the first assemblyoverlap sequence and the second assembly overlap sequence on each insertpolynucleotide in the second pool comprises about 25 nucleotides thatare complementary to the 3′ end of a first polynucleotide and the 5′ endof a second polynucleotide, respectively, in a pair of polynucleotidesfrom the first pool.

33. The method of any one of embodiments 30-32, wherein, for each pairin the first pool, the first polynucleotide and the secondpolynucleotide are linked together in a single construct, wherein thesingle construct comprises one or more recognition sequences for one ormore site-specific nucleases between the first polynucleotide and thesecond polynucleotide.

34. The method of embodiment 33, wherein the one or more recognitionsequences for one or more site-specific nucleases comprises a homingendonuclease recognition sequence.

35. The method of embodiment 33, wherein the linked single construct isproduced by joining individual first and second polynucleotides viasplicing and overlap-extension PCR (SOE-PCR), restriction-ligation,blunt-end ligation, overlap-based assembly method, recombination-basedmethod, or any other enzymatic or chemical method of joining the firstand second polynucleotides, or by synthesizing the single-constructdirectly.

36. The method of any one of embodiments 30-32, further comprisingcombining a cloning vector with the first pool and the second poolduring step (a), wherein opposing ends of the cloning vector comprisesequence complementary to a 5′end of the first polynucleotide and a 3′end of the second polynucleotide for each pair in the first pool.

37. The method of any one of embodiments 30-32, further comprisingcombining a cloning vector with the first pool prior to step (a),wherein opposing ends of the cloning vector comprise sequencecomplementary to a 5′ end of the first polynucleotide and a 3′ end ofthe second polynucleotide for each pair in the first pool.

38. The method of embodiment 36 or 37, wherein the cloning vector andthe 5′end of the first polynucleotide and the 3′end of the secondpolynucleotide in each pair from the first pool comprise one or morerecognition sequences for one or more site-specific nucleases.

39. The method of embodiment 38, further comprising generatingsingle-stranded complementary overhangs between the opposing ends of thecloning vector and the 5′end of the first polynucleotide and the 3′endof the second polynucleotide in each pair from the first pool by addingthe one or more site-specific nucleases for the one or more recognitionsequences.

40. The method of embodiment 39, further comprising ligating thesingle-stranded complementary overhangs between the opposing ends of thecloning vector and the 5′end of the first polynucleotide and the 3′ endof the second polynucleotide in each pair from the first pool.

41. The method of any one of embodiments 36-40, wherein step (b) resultsin a circular product comprising an insert polynucleotide from thesecond pool, a first and second polynucleotide from a pair from thefirst pool and the cloning vector.

42. The method of any one of embodiments 36-41, wherein the first poolis generated by selecting pairs of polynucleotide sequences from alarger set of such sequences such that no polynucleotide from the firstpool shares common sequence with any other polynucleotide from the firstpool beyond a specified threshold, excluding designed assembly overlapsequences between the pairs of polynucleotides of the first pool and theinsert polynucleotides of the second pool, or the pairs ofpolynucleotides of the first pool and the cloning vector.

43. The method of embodiment 42, wherein the specified threshold isbetween 5 and 15 contiguous nucleotides.

44. The method of any one of embodiments 30-43, wherein the assembly isan in vitro cloning method, wherein the mixture of the first pool andthe second pool is heated to partially or fully denature polynucleotidespresent in the first and the second pools, then cooled to roomtemperature before assembly.

45. A method for generating libraries of polynucleotides, the methodcomprising:

(a) amplifying via polymerase chain reaction (PCR) a first pool ofpolynucleotides, wherein the first pool contains pairs ofpolynucleotides, wherein each pair in the first pool contains a firstpolynucleotide and a second polynucleotide, and wherein each firstpolynucleotide and each second polynucleotide in a pair comprises a 5′end and a 3′ end, wherein the amplifying introduces a common overlapsequence comprising one or more recognition sequences for one or moresite-specific nucleases onto the 5′ end of a first polynucleotide andthe 3′ end of a second polynucleotide in a pair from the first pool;

(b) assembling each pair of first polynucleotides and secondpolynucleotides from the first pool into a single nucleic acid fragmentby utilizing common overlap sequence, wherein the single nucleicfragment for each pair comprises a first polynucleotide and secondpolynucleotide separated by the common overlap sequence from the 5′ endof the first polynucleotide and the 3′ end of the second polynucleotide,and wherein the 3′ end of the first polynucleotide and the 5′ end of thesecond polynucleotide in the single nucleic fragment for each pair arelocated on opposing terminal ends of the single nucleic acid fragment,distal to the one or more site-specific nuclease recognitionsequence(s);

(c) combining the single nucleic acid fragments for each pair with asecond pool containing insert polynucleotides, wherein each insertpolynucleotide in the second pool comprises a first assembly overlapsequence at its 5′ end that is complementary to the 3′ end of the firstpolynucleotide present within the single nucleic acid fragment and asecond assembly overlap sequence at its opposing 3′ end that iscomplementary to the 5′ end of the second polynucleotide present withinthe single nucleic acid fragment;

(d) assembling the first pool and the second pool into a third pool ofcircularized products, wherein the assembling is performed via in vitroor in vivo overlap assembly methods, and wherein each circularizedproduct in the third pool comprises an insert sequence from the secondpool and a pair of first polynucleotides and second polynucleotides fromthe first pool;

(e) linearizing each circularized product in the third pool viadigestion by one or more site-specific nuclease(s) that recognizes theone or more site-specific nuclease recognition sequence(s) locatedbetween the first polynucleotide sequence and the second polynucleotidesequence in each of the circularized products in the third pool; and

(f) assembling the linearized products into cloning vectors by in vitroor in vivo cloning methods.

46. The method of embodiment 45, wherein the one or more site-specificnuclease recognition sequence(s) located between the firstpolynucleotide sequence and the second polynucleotide sequence is ahoming nuclease recognition sequence.

47. The method of embodiment 45 or 46, wherein the one or moresite-specific nuclease(s) for the one or more site-specific nucleaserecognition sequence(s) located between the first polynucleotidesequence and the second polynucleotide sequence is a homingendonuclease.

48. The method of any one of embodiments 45-47, wherein the commonoverlap sequence comprises an assembly overlap sequence of at least 1nucleotide and the assembly in step (b) is performed by an overlap-basedDNA assembly method.

49. The method of any one of embodiments 45-47, wherein the commonoverlap sequence comprises an assembly overlap sequence of from 10-25nucleotides and the assembly in step (b) is performed by anoverlap-based DNA assembly method.

50. The method of embodiment 48 or 49, wherein the overlap-based DNAassembly method is selected from SOE-PCR or an in vitro overlap-assemblymethod.

51. The method of embodiment 50, wherein the one or more site-specificnuclease recognition sequence(s) present in the common overlap sequenceon the 5′ end of the first polynucleotide is complementary to the one ormore site-specific nuclease recognition sequence(s) present in thecommon overlap sequence on the 3′ end of the second polynucleotide ineach pair, and wherein the utilizing the common overlap sequences of thefirst and second polynucleotides in each pair in step (b) entailsperforming SOE-PCR.

52. The method of any one of embodiments 45-47, wherein the utilizingthe common overlap sequences of the first and second polynucleotides ineach pair in step (b) entails digesting the one or more site-specificnuclease recognition sequences present in the common overlap sequence onthe 5′ end of the first polynucleotide and the 3′ end of the secondpolynucleotide in each pair with one or more site specific nucleases forthe one or more site-specific nuclease recognition sequences to generatesingle-stranded overhangs on the 5′ end of the first polynucleotide andthe 3′ end of the second polynucleotide in each pair that comprisecomplementary sequence; and ligating the complementary sequence presenton the single-stranded overhang on the 5′ end of the firstpolynucleotide and the 3′ end of the second polynucleotide in each pair.

53. The method of any one of embodiments 45-52, wherein the assemblingof step (d) is performed using an overlap-based DNA assembly method.

54. The method of embodiment 53, wherein the overlap-based DNA assemblyis selected from SOE-PCR and an in vitro overlap-assembly method.

55. The method of any one of embodiments 45-52, wherein the 3′ end ofthe first polynucleotide and the 5′ end of the second polynucleotide inthe single nucleic acid fragment in each pair comprise an additional setof one or more site-specific nuclease recognition sequences and thefirst assembly overlap sequence and the second assembly overlap sequencein each insert polynucleotide in the second pool comprise one or moresite-specific nuclease recognition sequences.

56. The method of embodiment 55, wherein the assembling in step (d)entails digesting the additional one or more site-specific nucleaserecognition sequences present on the 3′ end of the first polynucleotideand the 5′ end of the second polynucleotide in the single nucleic acidfragment in each pair and the one or more site-specific nucleaserecognition sequences present in the first and second assembly sequencesin each insert polynucleotide from the second pool with one or more sitespecific nucleases for the additional one or more site-specific nucleaserecognition sequences on the 3′ end of the first polynucleotide and the5′ end of the second polynucleotide in the single nucleic acid fragmentin each pair and the one or more site-specific nuclease recognitionsequences present in the first and second assembly sequences in eachinsert polynucleotide from the second pool to generate a single-strandedoverhang on the 3′ end of the first polynucleotide that comprisessequence complementary to sequence present on a single-stranded overhangon the 5′ end of the first assembly sequence of an insert polynucleotidefrom the second pool and a single stranded overhang on the 5′ end of thesecond polynucleotide that comprises sequence complementary to asequence present on a single-stranded overhang on the 3′end of thesecond assembly sequence of the same insert polynucleotide from thesecond pool; and ligating the complementary sequence present on thesingle-stranded overhangs.

57. The method of any one of embodiments 45-56, wherein the cloningvectors of step (f) comprise one or more site-specific nucleaserecognition sequences.

58. The method of embodiment 57, wherein the assembling in step (f)entails digesting the one or more site-specific nuclease recognitionsequences in the cloning vectors with the one or more site-specificnucleases for the one or more site-specific nuclease recognitionsequences recognition sequences present in the cloning vectors, whereinthe digesting generates single-stranded overhangs on opposing ends ofthe cloning vectors, wherein the single-stranded overhang on one of theopposing ends of the cloning vector comprises sequence complementary toan end of the linearized product generated in step (e) and thesingle-stranded overhang on the other of the opposing ends of thecloning vectors comprises sequence complementary to an opposing end ofthe linearized product generated in step (e); and ligating thecomplementary sequences present on the single-stranded overhangs of thecloning vectors and the linearized products from step (e).

59. The method of any one of embodiments 45-58, wherein the first poolis generated by selecting pairs of polynucleotide sequences from alarger set of such sequences such that no polynucleotide from the firstpool shares common sequence with any other polynucleotide from the firstpool beyond a specified threshold, excluding designed assembly overlapsequences between the pairs of polynucleotides of the first pool and theinsert polynucleotides of the second pool, or the pairs ofpolynucleotides of the first pool and the cloning vector.

60. The method of embodiment 59, wherein the specified threshold isbetween 5 and 15 contiguous nucleotides.

61. The method of any one of embodiments 45-60, wherein the firstassembly overlap sequence and the second assembly overlap sequence oneach insert polynucleotide in the second pool comprises 1 or morenucleotides that are complementary to the opposing terminal ends of thesingle nucleic acid fragment.

62. The method of any one of embodiments 45-61, wherein the firstassembly overlap sequence and the second assembly overlap sequence oneach insert polynucleotide in the second pool comprises about 25nucleotides that are complementary to the opposing terminal ends of thesingle nucleic acid fragment.

63. The method of any one of embodiments 30-62, wherein, prior to step(a), the first pool of polynucleotides is generated by combining amixture containing each first polynucleotide from the pairs ofpolynucleotides with a mixture containing each second polynucleotidefrom the pairs of polynucleotides.

64. The method of any one of embodiments 30-63, wherein each pair in thefirst pool is double-stranded DNA (dsDNA) or single-stranded DNA(ssDNA).

65. The method of any one of embodiments 30-43, wherein each insertpolynucleotide in the second pool is dsDNA or ssDNA.

66. The method of any one of embodiments 30-65, wherein, for each pairin the first pool, the first polynucleotide and the secondpolynucleotide comprises sequence corresponding to a target genomiclocus in a host cell.

67. The method of any one of embodiments 30-65, wherein, for each pairin the first pool, the first polynucleotide and the secondpolynucleotide comprise coding sequence corresponding to a gene that ispart of a metabolic pathway.

68. The method of any one of embodiments 30-65, wherein, for each pairin the first pool, the first polynucleotide and the secondpolynucleotide comprise coding sequence corresponding to a functionaldomain or one or more proteins.

69. The method of any one of embodiments 30-68, wherein each insertpolynucleotide in the second pool comprises one or more payloadsequences located between the first assembly overlap sequence and thesecond assembly overlap sequence.

70. The method of embodiment 69, wherein the one or more payloadsequences are selected from promoters, genes, regulatory sequences,nucleic acid sequence encoding degrons, nucleic acid sequence encodingsolubility tags, terminators, unique identifier sequence or portionsthereof.

71. The method of embodiment 66, wherein, for each pair in the firstpool, the first polynucleotide and the second polynucleotide comprisessequence corresponding to a different target genomic locus in a hostcell as compared to each other pair in the first pool.

72. The method of embodiment 66, wherein, for each pair in the firstpool, the first polynucleotide and the second polynucleotide comprisessequence corresponding to the same target genomic locus in a host cell.

73. The method of any one of embodiments 69-72, wherein each payloadsequence in the insert polynucleotides in the second pool is differentfrom the payload sequence in each other insert polynucleotide in thesecond pool.

74. The method of any one of embodiments 69-72, wherein each payloadsequence in the insert polynucleotides in the second pool is the same asthe payload sequence in each other insert polynucleotide in the secondpool.

75. The method of any one of embodiments 30 or 69-74, wherein eachinsert polynucleotide in the second pool is generated by:

(i) performing a polymerase chain reaction (PCR) on a mixture comprisingthe payload sequence, a forward primer and a reverse primer, wherein theforward primer comprises from 5′ to 3′, a short stretch of one or morenucleotides complementary to the payload sequence, the first assemblyoverlap sequence, one or more recognition sequences for one or moresite-specific nucleases, the second assembly overlap sequence and asecond stretch of one or more nucleotides complementary to the payloadsequence and wherein the reverse primer comprises sequence complementaryto the payload sequence or to other sequence downstream of the payloadsequence, wherein the PCR generates a PCR product comprising from 5′ to3′, the short stretch of nucleic acid complementary to the payloadsequence, the first assembly overlap sequence, the one or moresite-specific nuclease recognition sequence(s), the second assemblyoverlap sequence and the payload sequence;

(ii) circularizing the PCR product via an assembly method selected fromthe group consisting of splicing and overlap-extension PCR (SOE-PCR),restriction-ligation, blunt-end ligation, overlap based assembly methodand recombination-based method, or any other enzymatic or chemicalmethod of joining two DNA molecules; and

(iii) linearizing the circularized PCR product with one or moresite-specific nuclease(s) that recognize the one or more site-specificnuclease recognition sequence(s), thereby generating the second pool ofpolynucleotides.

76. The composition or method of any one of the above embodiments,wherein the site-specific nuclease(s) is one or more of restrictionendonuclease(s), Type IIs endonuclease(s), homing endonuclease(s),RNA-guided nuclease(s), DNA-guided nuclease(s), zinc-finger nuclease(s),TALEN(s) or nicking enzyme(s).

The various embodiments described above can be combined to providefurther embodiments. All of the U.S. patents, U.S. patent applicationpublications, U.S. patent application, foreign patents, foreign patentapplication and non-patent publications referred to in thisspecification and/or listed in the Application Data Sheet areincorporated herein by reference, in their entirety. Aspects of theembodiments can be modified, if necessary to employ concepts of thevarious patents, application and publications to provide yet furtherembodiments.

These and other changes can be made to the embodiments in light of theabove-detailed description. In general, in the following claims, theterms used should not be construed to limit the claims to the specificembodiments disclosed in the specification and the claims, but should beconstrued to include all possible embodiments along with the full scopeof equivalents to which such claims are entitled. Accordingly, theclaims are not limited by the disclosure.

INCORPORATION BY REFERENCE

All references, articles, publications, patents, patent publications,and patent applications cited herein are incorporated by reference intheir entireties for all purposes. However, mention of any reference,article, publication, patent, patent publication, and patent applicationcited herein is not, and should not be taken as an acknowledgment or anyform of suggestion that they constitute valid prior art or form part ofthe common general knowledge in any country in the world.

1.-28. (canceled)
 29. A method for generating libraries ofpolynucleotides, the method comprising: (a) amplifying via polymerasechain reaction (PCR), a first pool of polynucleotides, wherein the firstpool contains pairs of polynucleotides, wherein each pair contains afirst polynucleotide and a second polynucleotide, and wherein each firstpolynucleotide and each second polynucleotide in a pair comprises a 5′end and a 3′ end, wherein the amplifying introduces a common overlapsequence comprising one or more recognition sequences for one or moresite-specific nucleases onto the 5′ end of a first polynucleotide andthe 3′end of a second polynucleotide in a pair from the first pool; (b)assembling each pair of first polynucleotides and second polynucleotidesfrom the first pool into a single nucleic acid fragment by utilizing thecommon overlap sequence, wherein the single nucleic fragment for eachpair comprises a first polynucleotide and second polynucleotideseparated by the common overlap sequence from the 5′ end of the firstpolynucleotide and the 3′ end of the second polynucleotide, and whereinthe 3′ end of the first polynucleotide and the 5′ end of the secondpolynucleotide in the single nucleic fragment for each pair are locatedon opposing terminal ends of the single nucleic acid fragment, distal tothe one or more site-specific nuclease recognition sequence(s); (c)combining the single nucleic acid fragments for each pair with a secondpool containing insert polynucleotides, wherein each insertpolynucleotide in the second pool comprises a first assembly overlapsequence at its 5′ end that is complementary to the 3′ end of the firstpolynucleotide present within the single nucleic acid fragment and asecond assembly overlap sequence at its opposing 3′end that iscomplementary to the 5′ end of the second polynucleotide present withinthe single nucleic acid fragment; (d) assembling the first pool and thesecond pool into a third pool of circularized products, wherein theassembling is performed via in vitro or in vivo overlap assemblymethods, and wherein each circularized product in the third poolcomprises an insert sequence from the second pool and a pair of firstpolynucleotides and second polynucleotides from the first pool; (e)linearizing each circularized product in the third pool via digestion byone or more site-specific nuclease(s) that recognize(s) the one or moresite-specific nuclease recognition sequence(s) located between the firstpolynucleotide sequence and the second polynucleotide sequence in eachof the circularized products in the third pool; and (f) assembling thelinearized products into cloning vectors by in vivo or in vivo cloningmethods.
 30. The method of claim 29, wherein the first pool is generatedby selecting pairs of polynucleotide sequences from a larger set of suchsequences such that no polynucleotide from the first pool shares commonsequence with any other polynucleotide from the first pool beyond aspecified threshold, excluding designed assembly overlap sequencesbetween the pairs of polynucleotides of the first pool and the insertpolynucleotides of the second pool, or the pairs of polynucleotides ofthe first pool and the cloning vector.
 31. The method of claim 30,wherein the specified threshold is between 5 and 15 contiguousnucleotides.
 32. The method of claim 29, wherein the one or moresite-specific nuclease recognition sequence(s) located between the firstpolynucleotide sequence and the second polynucleotide sequence is ahoming nuclease recognition sequence and the one or more site-specificnuclease(s) is a homing endonuclease.
 33. The method of claim 29,wherein the common overlap sequence comprises an assembly overlapsequence of at least 1 nucleotide and the assembly in step (b) isperformed by an overlap-based DNA assembly method.
 34. The method ofclaim 33, wherein the overlap-based DNA assembly method is selected fromsplicing and overlap-extension PCR (SOI-PCR) or an in vitrooverlap-assembly method.
 35. The method of claim 34, wherein the one ormore site-specific nuclease recognition sequence(s) present in thecommon overlap sequence on the 5′ end of the first polynucleotide iscomplementary to the one or more site-specific nuclease recognitionsequence(s) present in the common overlap sequence on the 3′ end of thesecond polynucleotide in each pair, and wherein the utilizing the commonoverlap sequences of the first and second polynucleotides in each pairin step (b) entails performing SOE-PCR.
 36. The method of claim 29,wherein the utilizing the common overlap sequences of the first andsecond polynucleotides in each pair in step (b) entails digesting theone or more site-specific nuclease recognition sequences present in thecommon overlap sequence on the 5′ end of the first polynucleotide andthe 3′ end of the second polynucleotide in each pair with one or moresite specific nucleases for the one or more site-specific nucleaserecognition sequences to generate single-stranded overhangs on the 5′end of the first polynucleotide and the 3′ end of the secondpolynucleotide in each pair that comprise complementary sequence; andligating the complementary sequence present on the single-strandedoverhang on the 5′ end of the first polynucleotide and the 3′ end of thesecond polynucleotide in each pair.
 37. The method of claim 29, whereinthe assembling of step (d) is performed using an overlap-based DNAassembly method.
 38. The method of claim 29, wherein the 3′ end of thefirst polynucleotide and the 5′ end of the second polynucleotide in thesingle nucleic acid fragment in each pair comprise an additional set ofone or more site-specific nuclease recognition sequences and the firstassembly overlap sequence and the second assembly overlap sequence ineach insert polynucleotide in the second pool comprise one or moresite-specific nuclease recognition sequences.
 39. The method of claim38, wherein the assembling in step (d) entails digesting the additionalone or more site-specific nuclease recognition sequences present on the3′ end of the first polynucleotide and the 5′ end of the secondpolynucleotide in the single nucleic acid fragment in each pair and theone or more site-specific nuclease recognition sequences present in thefirst and second assembly sequences in each insert polynucleotide fromthe second pool with one or more site specific nucleases for theadditional one or more site-specific nuclease recognition sequences onthe 3′ end of the first polynucleotide and the 5′ end of the secondpolynucleotide in the single nucleic acid fragment in each pair and theone or more site-specific nuclease recognition sequences present in thefirst and second assembly sequences in each insert polynucleotide fromthe second pool to generate a single-stranded overhang on the 3′ end ofthe first polynucleotide that comprises sequence complementary tosequence present on a single-stranded overhang on the 5′end of the firstassembly sequence of an insert polynucleotide from the second pool and asingle stranded overhang on the 5′ end of the second polynucleotide thatcomprises sequence complementary to a sequence present on asingle-stranded overhang on the 3′end of the second assembly sequence ofthe same insert polynucleotide from the second pool; and ligating thecomplementary sequence present on the single-stranded overhangs.
 40. Themethod of claim 29, wherein the cloning vectors of step (f) comprise oneor more site-specific nuclease recognition sequences.
 41. The method ofclaim 40, wherein the assembling in step (1) entails digesting the oneor more site-specific nuclease recognition sequences in the cloningvectors with the one or more site-specific nucleases for the one or moresite-specific nuclease recognition sequences recognition sequencespresent in the cloning vectors, wherein the digesting generatessingle-stranded overhangs on opposing ends of the cloning vectors,wherein the single-stranded overhang on one of the opposing ends of thecloning vector comprises sequence complementary to an end of thelinearized product generated in step (e) and the single-strandedoverhang on the other of the opposing ends of the cloning vectorscomprises sequence complementary to an opposing end of the linearizedproduct generated in step (e); and ligating the complementary sequencespresent on the single-stranded overhangs of the cloning vectors and thelinearized products from step (e).
 42. The method of claim 29, whereinthe first assembly overlap sequence and the second assembly overlapsequence on each insert polynucleotide in the second pool comprises 1 ormore nucleotides that are complementary to the opposing terminal ends ofthe single nucleic acid fragment.
 43. The method of claim 29, wherein,for each pair in the first pool, the first polynucleotide and the secondpolynucleotide comprises sequence corresponding to a target genomiclocus in a host cell.
 44. The method of claim 29, wherein each insertpolynucleotide in the second pool comprises one or more payloadsequences located between the first assembly overlap sequence and thesecond assembly overlap sequence.
 45. The method of claim 44, whereinthe one or more payload sequences are selected from promoters, genes,regulatory sequences, nucleic acid sequence encoding degrons, nucleicacid sequence encoding solubility tags; terminators, unique identifiersequence or portions thereof.
 46. The method of claim 43, wherein, foreach pair in the first pool, the first polynucleotide and the secondpolynucleotide comprises sequence corresponding to a different targetgenomic locus in a host cell as compared to each other pair in the firstpool.
 47. The method of claim 44, wherein each payload sequence in theinsert polynucleotides in the second pool is different from the payloadsequence in each other insert polynucleotide in the second pool.
 48. Themethod of claim 44, wherein each insert polynucleotide in the secondpool is generated by: (i) performing PCR on a mixture comprising thepayload sequence, a forward primer and a reverse primer, wherein theforward primer comprises from 5′ to 3′, a short stretch of one or morenucleotides complementary to the payload sequence, the first assemblyoverlap sequence, one or more recognition sequences for one or moresite-specific nucleases, the second assembly overlap sequence and asecond stretch of one or more nucleotides complementary to the payloadsequence and wherein the reverse primer comprises sequence complementaryto the payload sequence or to other sequence downstream of the payloadsequence, wherein the PCR generates a PCR product comprising from 5′ to3′, the short stretch of nucleic acid complementary to the payloadsequence, the first assembly overlap sequence, the one or moresite-specific nuclease recognition sequence(s), the second assemblyoverlap sequence and the payload sequence; (ii) circularizing the PCRproduct via an assembly method selected from the group consisting ofSOE-PCR, restriction-ligation, blunt-end ligation, overlap basedassembly method and recombination-based method, or any other enzymaticor chemical method of joining two DNA molecules; and (iii) linearizingthe circularized PCR product with one or more site-specific nuclease(s)that recognize the one or more site-specific nuclease recognitionsequence(s), thereby generating the second pool of polynucleotides.